[{"content":" Privacy Policy for www.hairizuan.com # Last updated: 5 September 2023\nAt hairizuan.com (hereinafter referred to as \u0026ldquo;we,\u0026rdquo; \u0026ldquo;us,\u0026rdquo; or \u0026ldquo;our\u0026rdquo;), we take your privacy seriously. This Privacy Policy outlines how we collect, use, disclose, and safeguard your personal information when you visit our blog, www.hairizuan.com. By accessing or using our blog, you consent to the practices described in this Privacy Policy.\n1. Information We Collect # 1.1. Personal Information: We may collect personal information such as your name, email address, and any other information you voluntarily provide when you comment on our blog or contact us via email.\n1.2. Non-Personal Information: We may also collect non-personal information, including but not limited to your IP address, browser type, and browsing patterns on our website.\n2. How We Use Your Information # 2.1. We may use your personal information for the following purposes:\nTo respond to your inquiries or comments. To send you newsletters or updates if you have subscribed to our mailing list. To improve our website and the content we provide. To analyze user behavior and trends. 2.2. We may use non-personal information for statistical purposes and to enhance the user experience on our blog.\n3. Cookies and Tracking Technologies # 3.1. We use cookies and similar tracking technologies to collect information about your browsing behavior and preferences. You can control the use of cookies through your browser settings.\n4. Disclosure of Your Information # 4.1. We do not sell, trade, or otherwise transfer your personal information to third parties unless we provide you with advance notice and obtain your consent.\n4.2. We may share non-personal information with third-party service providers for analytical purposes or to improve our website.\n5. Third-Party Links # 5.1. Our blog may contain links to third-party websites. We are not responsible for the privacy practices or content of these websites. Please review the privacy policies of these websites before providing them with any personal information.\n6. Security # 6.1. We take reasonable measures to protect your personal information from unauthorized access, disclosure, alteration, or destruction. However, no data transmission over the internet or storage system can be guaranteed to be 100% secure. Therefore, we cannot guarantee the absolute security of your data.\n7. Your Choices # 7.1. You may choose to unsubscribe from our mailing list or opt-out of cookies by adjusting your browser settings.\n8. Changes to This Privacy Policy # 8.1. We may update this Privacy Policy from time to time. Any changes will be posted on this page with the updated date.\n9. Contact Us # 9.1. If you have any questions or concerns about this Privacy Policy, please contact us at hairizuanbinnoorazman@gmail.com.\nBy using our blog, you acknowledge that you have read and understood this Privacy Policy and agree to its terms and conditions.\n","date":"5 September 2023","externalUrl":null,"permalink":"/policy/","section":"Experiment, Fail, Learn, Repeat","summary":"Privacy Policy for www.hairizuan.com # Last updated: 5 September 2023\nAt hairizuan.com (hereinafter referred to as “we,” “us,” or “our”), we take your privacy seriously. This Privacy Policy outlines how we collect, use, disclose, and safeguard your personal information when you visit our blog, www.hairizuan.com. By accessing or using our blog, you consent to the practices described in this Privacy Policy.\n","title":"Privacy Policy","type":"page"},{"content":" Privacy Policy # The rgoogleslides package includes a default google credentials in order to make it easier to try the following package. (It is recommended for you to use your client id/client secret from your own Google Project though)\nThis project does not collect any information from any services that you authorize using this package. To confirm this, you are free to inspect the code base within this R package\n","date":"1 January 2019","externalUrl":null,"permalink":"/rgoogleslides/","section":"Experiment, Fail, Learn, Repeat","summary":"Privacy Policy # The rgoogleslides package includes a default google credentials in order to make it easier to try the following package. (It is recommended for you to use your client id/client secret from your own Google Project though)\n","title":"R Googleslides Package","type":"page"},{"content":" Bio # Hairizuan is a Devops Engineer at Kiteworks. He is a avid fan of tools and technologies and has dabbled in various programming languages such as Golang, Python, Elm and R. He is currently one of the co-organizer for the GDG Cloud Singapore meetup group.\nSocial Profiles # https://www.linkedin.com/in/hairizuan-noorazman/ https://www.facebook.com/hairizuan.noorazman\nIs one of the co-host for the webinars hosted on GDG Cloud Singapore Webinar\nhttps://www.youtube.com/c/GDGCloudSingapore\nSessionize - Call for papers\nhttps://sessionize.com/hairizuan/\nBooks # Author for the following books\nGolang for Jobseekers Talks # Using Command-Line Tools to Power Smarter Content Strategy # Everyone is rushing to build MCP servers and browser integrations for their AI agents — but sometimes the humble CLI is the smarter choice. In this session, Hairizuan will walk through how he wired a pre-built AI agent to two simple CLI tools: one pulling Google Analytics data, another querying Google Search Console. No servers, no OAuth dance on every run, just composable, scriptable, auditable pipelines. The result? An agent that reads what your audience is actually searching for and generating blog topic ideas grounded in real traffic signals — not vibes. We\u0026rsquo;ll cover when CLIs beat MCP, and when they don\u0026rsquo;t.\nSlides: https://docs.google.com/presentation/d/18uTVAW_O2LNNt2bDnE3DYFonbpJvMxynKK6cBTNprbU/edit?usp=sharing\nIntroducing LLMs in Devops # Introducing LLMs in order to automate Devops Processes within Kiteworks. The focus for this is the deployment of Chatbot that is backed by a Vector database and is able to retrieve better responses via RAG mechanism.\nSlides: https://docs.google.com/presentation/d/141Yh_XS9u5g3CLgySv21eQj7fULB3hw4s-rJle75OvQ/edit?usp=sharing\nBuild your own code assessment platform but on Kubernetes # Creating your own code assessment platform but on Kubernetes. This involves creating an application that would be deployed on Kubernetes. This application has the capability to spin up pods that would be able to run submitted code to check for \u0026ldquo;correctness\u0026rdquo; of the code etc.\nSlides: https://docs.google.com/presentation/d/1XmNMDlMjcEETu-ybw-mnBjIu1igmCd084gQzxRZF3Js/edit?usp=sharing\nBack to Basics: Deploying an application on a server # A workshop session that covers the basics of application deployment on a server such as:\nscp of application artifacts to the server Setup of systemd for the application on server Build your own redis # Redis changed their license and that resulted in some companies needing to switch away from Redis. It might be a good opportunity to look at Redis to see how it clicks under the hood - how data is passed from server to redis servers etc (looking and trying to understand the redis protocol). We will then attempt to build a small redis based on that (only covering the critical redis api-s)\nSlides: https://docs.google.com/presentation/d/1qM8LUksshhiAkpdg6MLVRuWmY8gJ0beWU2IhWricFa8/edit?usp=sharing\nVideos: https://www.youtube.com/live/BaNEKiJ7blA?si=ga4jSZ0mAN9GA76J\u0026t=3220\nBuild your own code assessment platform # A session about building your own code assessment platform. Particular focus on the sandbox environment to run submitted code which will be implemented via docker for this particular talk. There will be a focus on security configurations needed for docker setup\nSlides: https://docs.google.com/presentation/d/1aIRND0mP-42b2ZKcvEJXtEX-31UT66v1BwL61khl-Yo/edit?usp=sharing\nFeature flags can be surprisingly complicated # Feature flags are usually an after-thought when it comes to building applications. However, there is an entire army of developers that think otherwise. There is now an small set of companies that aim to provide feature flags as a service. There is now even a project in CNCF that aims to standardize this in order to ensure that users are not bound to a single provider. This talk aims to cover this (and more if time permits)\nSlides: https://docs.google.com/presentation/d/1fNXmnvnCRZ5Wn9NOzqpX2SwJRjGzFXZuT9EtaY_IWlI/edit#slide=id.p\nBlock Youtube shorts with Chrome Extensions # A quick introduction on how to build a chrome extension in order to block youtube shorts on the youtube website. https://www.hairizuan.com/chrome-extension-to-get-rid-of-youtube-shorts/\nSlides: https://docs.google.com/presentation/d/1W6HUNWFyH0SE2PKPbfwWl6k7EBIJ66FQKQr_OAj47N0/edit#slide=id.p\nCode: https://github.com/hairizuanbinnoorazman/youtube-cleanup\nUsing Emulators for Testing Google Cloud Datastore # Talk is about the situation where we would want to test an application that relies on Google Cloud Datastore locally. Google Cloud Datastore is a cloud based service - which raises the question of how a developer can test it locally, ideally without requiring to create a separate Google Cloud Project to safely test the changes.\nSlides: https://docs.google.com/presentation/d/1qtzs2n5ChbXwi-ZhZtqwf_bSYApM1_5q49_mvrFRMrY/edit#slide=id.p\nDeploying apps using workload identity on GKE # Talk on introducing audience to deployment of applications on Google Kubernetes Engine. The application being used for demo would need to contact a Google APIs. The demo would consist having the application deployed in a Kubernetes cluster without needing a service account file for authentication of api requests.\nSlides: https://docs.google.com/presentation/d/1-Vsy_1PpQV5wJNyTYuw4OKhyLTt_w_imaT4XPoyIStg/edit#slide=id.p Slides (Devfest edition): https://docs.google.com/presentation/d/1MYLJDINrvph-XBW08oITfftWy0g9HE8lu2qoeF7ObMU/edit#slide=id.g26290ab0c45_0_86\nIntroduction to Cloud # Talk on introducing people unfamiliar with cloud to cloud technologies/platforms. Used angle of understanding cloud from feature set available\nVideo Stream: https://www.twitch.tv/videos/1039052712\nVideo: https://www.youtube.com/watch?v=N0UA7DgeFBY\nKubernetes HPA with Custom Metrics # A demo of how to utilize an application\u0026rsquo;s custom metrics\nSlides: https://docs.google.com/presentation/d/159fA2Q12nSaldHD--0Ypln_tzruVNiM36edAAUaRoY0/edit?usp=sharing\nVideo Recording: https://www.youtube.com/watch?v=IxDqs7387YI\nDeploy via spreadsheet? Thats a bad idea # Demo of deploying apps into k8s clusters but controlled via Google Spreadsheets\nSlides: https://docs.google.com/presentation/d/1YFnL9oirzsaVvqTVM6HshnfVFwqboRzv1KhytqCNQO0/edit#slide=id.p Video Recording: https://www.youtube.com/watch?v=4KZkBJFOgrQ\u0026t=5106s\nInteresting Features in GKE # Covering on Workload Identity, Config Connector, Managed Application Delivery etc\nSlides: https://docs.google.com/presentation/d/1ptjcfpRuoKGqAsSr7O-USulBEcu9JdQpEZ3y3yQ7Tuk/edit?usp=sharing\nVideo Recording: https://www.youtube.com/watch?v=xlWX7iNKag8\nIntroduction to Skaffold # Introductory session to skaffold, reasons for using it as well as how to quickly get started with the tool\nVideo: https://www.youtube.com/watch?v=xNq-aFohfgk\nSkaffold with Google Cloud Build # Talk on using Skaffold to deploy applications to Kubernetes Clusters. Instead of using local docker engine runtime; one would use the Google Cloud Build as the platform to build the image artifacts that would deployed to the cluster\nThis is mostly a demo session\nQuick tour of Knative # A quick tour around the internals of how Knative works under the hood. Knative is the platform that powers Google Cloud Run; this talk explores the various pieces of technologies that one would need to run if one starts from just plain old Virtual Machines.\nNote: Video Recording of demo this time failed (due to typo)\nEvent: Fosassia 2020\nBlog: https://www.hairizuan.com/trying-knative-from-scratch/\nVideo Recording: https://youtu.be/F71rvTQ8unA\nGenerating videos from slides on applications served from Google Cloud Run # Using Google Cloud Run to create a bunch of services which when combined together would convert presentation slide pdfs and scripts (not programming script but a script on what to say during a presentation) into a video. The following set of services is build by using Google Cloud Run and Google Text to Speech and Google Cloud Storage and Google Datastore; all deployed via Google Cloud Build\nSlides: https://docs.google.com/presentation/d/1Vuv7C1rNGbKdOvpJji5QNiqwvsPvHF6uWzNNEsIthlQ/\nGCSFuse; Heard of it # An lightning talk to give an introduction to GCSFuse and the Fuse common interface.\nSlides: https://docs.google.com/presentation/d/1MWEhJIHRgO60Bc-4HhjdIFSA1L2Z3aOu1lIZl6fWVqE\nIntroduction to Stackdriver # An introduction to stackdriver, a feature in the Google Cloud Platform. It provides monitoring, logging, profiling services. A golang web application is used an example to demonstrate on how to get such capbilities set up.\nSlides: https://docs.google.com/presentation/d/1JtV8N9in039VdtJzz7XN4UcxGHYuowxJQ0b8TOfGn54\nVideo Recording: https://engineers.sg/video/google-cloud-next-2019-singapore-using-stackdriver-effectively--3402\nIntroduction to Cloud Run # An introduction to Google Cloud Run, a newly announced serverless solution which allows one to deploy any runtime and any software and let Google manage it. The main piece of this presentation is a demonstration of how to deploy a service onto Google Cloud Run via Google Cloud Console GUI\nSlides: https://docs.google.com/presentation/d/1M8EhARDBY33IefEz356NhdUkkSyUZo1tHZBkMt-NtpE\nVideo: https://www.youtube.com/watch?v=n1wtjEmb7eI\nBlog Post: https://www.hairizuan.com/introduction-to-google-cloud-run/\nTriggering analytics with serverless functions # Using the various serverless functions to trigger different workflows. Demonstrate the usage of different triggers from Google Cloud Platform to run analytical workloads. Presented during Google Cloud Devfest 2018, October 2018\nSlides: https://docs.google.com/presentation/d/1trt8SyQYSgUfx8AfHZ7Pt8_VzfIqEsJerpQYqhQ-MIw\nUsing Google Cloud Functions for Analytics Workloads # Creating a slack bot that could analyze and return graphs on meetup stats on a single meetup event. This is done by creating an API via Google Cloud Functions. Presented during Google Cloud Next Extended Singapore 2018, August 2018\nVideo Recording: https://www.youtube.com/watch?v=OYv8nyA8pj8\nSlides: https://docs.google.com/presentation/d/1H05sgx7W83_NlNV2cGdjBUBsU1q1PuRSxwe3E88Ybyg/\nQuickstart Kubernetes # An overview of a variety of concepts such as docker containers as well as Kubernetes terminology which is needed before introducing someone to the Kubernetes tool. Presented during Google Devfest Singapore 2017, October 2017\nSlides: https://docs.google.com/presentation/d/1KW9jwpD10vNm7itrDZD8m0X9zya8jwISl_bPpYcw1_M\nFrom Analysis to Boardroom: Google Slides presentations via R # Went through the reasons for automating analysis work and provided several code snippets on how to get automated analysis when using the R programming language. Presented during Google IO Extended 2017 event, July 2017\nThis is a Demo only session. No slides are available here.\nSessions # Date Event Name Event Link Topic 2026-04-28 GDG Singapore April Meetup https://gdg.community.dev/events/details/google-gdg-singapore-presents-gdg-monthly-meetup-2604-supercharge-your-agents/ Using Command-Line Tools to Power Smarter Content Strategy 2025-11-22 GDG Singapore Devfest 2025 https://gdg.community.dev/events/details/google-gdg-singapore-presents-devfest-singapore-2025-conference-1/cohost-gdg-singapore/ Introducing LLMs in Devops 2025-06-14 GDG Next Extended SG 2025 https://gdg.community.dev/events/details/google-gdg-singapore-presents-google-cloud-next-extended-singapore-2025/cohost-gdg-singapore Deploying apps using workload identity on GKE 2024-11-30 GDG Cloud Devfest KL 2024 https://gdg.community.dev/events/details/google-gdg-cloud-kl-presents-gdg-cloud-kl-devfest-2024/ Feature flags can be surprisingly complicated 2024-11-23 GDG Devfest Singapore 2024 https://gdg.community.dev/events/details/google-gdg-singapore-presents-devfest-singapore-2024-gemini-conference/ Build your own code assessment platform but on Kubernetes 2024-11-23 GDG Devfest Singapore 2024 https://gdg.community.dev/events/details/google-gdg-singapore-presents-devfest-singapore-2024-workshop/cohost-gdg-singapore Back to Basics: Deploying an application on a server 2024-11-16 GDG Cloud Devfest Surabaya 2024 https://gdg.community.dev/events/details/google-gdg-cloud-surabaya-presents-devfest-cloud-surabaya-2024/ Build your own redis 2024-08-29 GDG Cloud Singapore August Meetup https://gdg.community.dev/events/details/google-gdg-cloud-singapore-presents-gdg-cloud-singapore-august-meetup/ Build your own redis 2024-06-22 GDG Cloud KL IO Extended 2024 https://gdg.community.dev/events/details/google-gdg-cloud-kl-presents-gdg-cloud-kl-io-extended-2024/ Build your own redis 2024-06-01 GDG Cloud Singapore IO Extended 2024 https://gdg.community.dev/events/details/google-gdg-cloud-singapore-presents-google-io-extended-singapore-2024/ Build your own code assessment platform 2023-12-02 GDG KL Devfest 2023 https://gdg.community.dev/events/details/google-gdg-kuala-lumpur-presents-devfest-2023-kuala-lumpur/ Deploying apps using workload identity on GKE 2023-11-18 GDG Singapore Devfest 2023 https://sites.google.com/view/devfest-singapore-2023/speakers Feature flags can be surprisingly complicated 2023-10-14 Geekcamp SG 2023 https://geekcamp.sg/ Block Youtube shorts with Chrome Extensions 2023-07-29 I/O Extended Singapore 2023 https://gdg.community.dev/events/details/google-gdg-cloud-singapore-presents-google-io-extended-cloud-edition-2023/ Using Emulators for Testing Google Cloud Datastore 2023-07-18 KubernetesSG Meetup Jul 2023 https://www.meetup.com/k8s-sg/events/294559504/ Deploying apps using workload identity on GKE 2023-02-20 KubernetesSG Meetup Feb 2023 https://www.meetup.com/k8s-sg/events/291463340/ Kubernetes HPA with Custom Metrics 2022-12-15 GDSC MUM x Google Singapore https://gdsc.community.dev/events/details/developer-student-clubs-monash-university-malaysia-presents-gdsc-mum-x-google-singapore/ Introduction to Cloud Run 2022-05-12 Introduction to Cloud https://www.youtube.com/watch?v=N0UA7DgeFBY\u0026ab_channel=GoogleDeveloperStudentClubPSBAcademy Introduction to Cloud 2021-06-24 GDG Cloud Extended KL 2021 https://gdg.community.dev/events/details/google-gdg-cloud-kl-presents-google-io-extended-gdg-cloud-kl/ Introduction to Cloud Run 2020-10-31 GDG Cloud Devfest 2020 https://www.youtube.com/watch?v=4KZkBJFOgrQ Deploy via spreadsheet? Thats a bad idea 2020-06-23 June Devrel Google Cloud Talks No event link Skaffold with Google Cloud Build 2020-06-11 Kubernetes June 2020 Meetup https://www.meetup.com/Singapore-Kubernetes-User-Group/events/268492981/ Introduction to Skaffold 2020-05-07 GDG Cloud Singapore Webinar https://www.meetup.com/GDG-Cloud-Singapore/events/270423553 Interesting Features in GKE 2020-03-20 Fossasia 2020 https://2020.fossasia.org/event/schedule.html#6098 Quick tour of Knative 2019-11-09 GDG Cloud Singapore Devfest https://www.meetup.com/GDG-Cloud-Singapore/events/264449620 Introduction to Skaffold 2019-10-23 La Kopi - Serverless https://events.withgoogle.com/la-kopi-serverless/ Generating videos from slides on applications served from Google Cloud Run 2019-07-24 GDG Cloud Singapore Meetup July 2019 https://www.meetup.com/gdg-cloud-singapore/events/262726983 GCSFuse; Heard of it 2019-06-22 io19 Extended https://www.meetup.com/gdg-singapore/events/261587580/ Introduction to Cloud Run 2019-06-01 Cloud Next Extended SG - Data Edition https://www.meetup.com/gdg-cloud-singapore/events/258359490 Introduction to Stackdriver 2019-04-24 Cloud Next Extended SG https://www.meetup.com/gdg-cloud-singapore/events/259734957 Introduction to Cloud Run 2018-10-27 Cloud Devfest 2018 https://www.meetup.com/GCPUGSG/events/253546454/ Triggering analytics with serverless functions 2018-08-25 Google Cloud Next 2018 https://www.meetup.com/GCPUGSG/events/251921227/ Using Google Cloud Functions for Analytics Workloads 2017-10-28 GDG Devfest 2017 Singapore https://devfest17.peatix.com/ Quickstart Kubernetes 2017-07-01 I/O Extended 2017 Singapore https://peatix.com/event/258914 From Analysis to Boardroom: Google Slides presentations via R ","date":"1 January 2019","externalUrl":null,"permalink":"/speaker-profile/","section":"Experiment, Fail, Learn, Repeat","summary":"Bio # Hairizuan is a Devops Engineer at Kiteworks. He is a avid fan of tools and technologies and has dabbled in various programming languages such as Golang, Python, Elm and R. He is currently one of the co-organizer for the GDG Cloud Singapore meetup group.\n","title":"Speaker Profile","type":"page"},{"content":"The following page lists out all the tools that are being built and then embeded to this blog\nLifestyle # BMI Calculator Bus Arrival ","date":"1 January 2019","externalUrl":null,"permalink":"/tools/","section":"Experiment, Fail, Learn, Repeat","summary":"The following page lists out all the tools that are being built and then embeded to this blog\nLifestyle # BMI Calculator Bus Arrival ","title":"Tools","type":"page"},{"content":"Below is a collection of some of projects I\u0026rsquo;ve been working on. These are not code snippets but of tools/internal libraries I\u0026rsquo;ve built to solve issues that I have in the past\nR # rgoogleslides rgoogledrive Golang # Slides to Video Tool Syncer Tool Sample Golang Apps tasker sample gin bookcase application Web # Maintenance of GDG Cloud Singapore Website: https://www.gcpugsg.com/ ","date":"9 April 2014","externalUrl":null,"permalink":"/about/","section":"Experiment, Fail, Learn, Repeat","summary":"Below is a collection of some of projects I’ve been working on. These are not code snippets but of tools/internal libraries I’ve built to solve issues that I have in the past\n","title":"About Me","type":"page"},{"content":"","externalUrl":null,"permalink":"/posts/","section":"Posts","summary":"","title":"Posts","type":"posts"},{"content":"","date":"15 June 2026","externalUrl":null,"permalink":"/categories/ai/","section":"Article Categories","summary":"","title":"Ai","type":"categories"},{"content":"When using Claude Code inside Worklayer\u0026rsquo;s terminal panel, I wanted it to be able to interact with web pages displayed in adjacent web panels. The standard approach would be to use the Playwright MCP server, but that spawns a separate Chromium instance outside the app. The page Playwright controls and the page the user sees are two different browser sessions with no shared state.\nI needed the AI to control the same webview the user is looking at. Same session, same cookies, same DOM. So I built a custom MCP server that talks to Electron\u0026rsquo;s webviews via the Chrome DevTools Protocol.\nArchitecture # The communication chain looks like this:\nClaude Code (terminal panel) | stdio JSON-RPC v MCP Server (Node.js process) | HTTP to localhost v Worklayer Main Process | webContents.debugger.sendCommand() v Webview CDP (Chrome DevTools Protocol) The MCP server runs as a standalone Node.js process spawned by Claude Code. It cannot use Electron\u0026rsquo;s IPC directly because it is not part of the Electron app. Instead, it makes HTTP requests to a local server that Worklayer\u0026rsquo;s main process already runs for browser interception.\nWhy HTTP Instead of WebSocket or IPC # Worklayer already had a local HTTP server for intercepting browser opens from CLI tools (the $BROWSER env var trick). Adding CDP routes to this existing server meant zero new dependencies or ports. Token-based auth was already in place. The MCP server just needs two environment variables: WORKLAYER_MCP_PORT and WORKLAYER_MCP_TOKEN, which are automatically set in the terminal\u0026rsquo;s environment when it spawns.\nWhy webContents.debugger Over \u0026ndash;remote-debugging-port # Electron can expose a browser-wide debugging port with --remote-debugging-port, which is what you would connect Playwright to. But this exposes all webviews and requires target discovery and filtering. I wanted page-level access to a specific webview.\nElectron\u0026rsquo;s webContents.debugger API provides exactly this. You call wc.debugger.attach('1.3') on a specific webContents and then send CDP commands directly to it. No target discovery, no filtering, no risk of accidentally attaching to the wrong page. The webContents ID is already known because webviews register themselves on dom-ready.\nAccessibility Tree Over DOM Selectors # Following the pattern established by the Playwright MCP server, I use Accessibility.getFullAXTree to get a structured tree of the page rather than relying on CSS or XPath selectors. Each node gets a sequential UID that maps to a backendDOMNodeId for resolving click coordinates.\nThis works well with LLMs because the accessibility tree maps to how they reason about page structure. A tree node like [4] textbox \u0026quot;Search\u0026quot; value=\u0026quot;\u0026quot; is more meaningful to the model than a CSS selector like input.search-bar[data-testid=\u0026quot;search\u0026quot;]. It is also more stable across page updates.\nMutex for Tool Serialization # CDP commands can interfere if run concurrently. A navigation mid-snapshot would produce garbage. A simple promise-chain mutex ensures only one tool runs at a time:\nlet mutexPromise = Promise.resolve(); function withMutex(fn) { const prev = mutexPromise; let resolve; mutexPromise = new Promise((r) =\u0026gt; { resolve = r; }); return prev.then(fn).finally(resolve); } Every tool handler wraps its logic in withMutex. This is simpler than a full queue system and sufficient for the sequential nature of LLM tool calls.\nGotchas # A few things that were not obvious from the CDP documentation:\nwebContents.fromId() can return null if the webview was destroyed between listing panels and executing a command. Always check. Navigation history uses entry id, not array index. Page.navigateToHistoryEntry takes entryId from the history entry object. DOM.getBoxModel returns quads as flat arrays [x1,y1, x2,y2, x3,y3, x4,y4], not point objects. The center for clicking is the average of all four corners. Input.insertText does not fire keydown/keyup events. For form fields with JS event handlers, you may need individual Input.dispatchKeyEvent calls. CDP domains must be explicitly enabled before use. Call Page.enable, DOM.enable, Accessibility.enable after attaching the debugger. The Result # The MCP server exposes 17 tools: navigate, click, type, screenshot, snapshot, hover, fill, press key, select option, handle dialogs, upload files, resize viewport, network requests, console messages, evaluate JavaScript, and route/unroute for request mocking. Claude Code discovers it automatically via .mcp.json in the project root.\nThe key outcome is that what Claude does is exactly what the user sees. There is no second browser window, no session mismatch, no disconnect between the AI\u0026rsquo;s actions and the visible state. The user watches the web panel update in real time as the AI interacts with it.\n","date":"15 June 2026","externalUrl":null,"permalink":"/building-a-custom-mcp-server-with-chrome-devtools-protocol/","section":"Posts","summary":"When using Claude Code inside Worklayer’s terminal panel, I wanted it to be able to interact with web pages displayed in adjacent web panels. The standard approach would be to use the Playwright MCP server, but that spawns a separate Chromium instance outside the app. The page Playwright controls and the page the user sees are two different browser sessions with no shared state.\n","title":"Building a Custom MCP Server with Chrome DevTools Protocol","type":"posts"},{"content":"","date":"15 June 2026","externalUrl":null,"permalink":"/tags/cdp/","section":"Technology Tags","summary":"","title":"Cdp","type":"tags"},{"content":"","date":"15 June 2026","externalUrl":null,"permalink":"/tags/electron/","section":"Technology Tags","summary":"","title":"Electron","type":"tags"},{"content":"","date":"15 June 2026","externalUrl":null,"permalink":"/tags/mcp/","section":"Technology Tags","summary":"","title":"Mcp","type":"tags"},{"content":"","date":"15 June 2026","externalUrl":null,"permalink":"/categories/tooling/","section":"Article Categories","summary":"","title":"Tooling","type":"categories"},{"content":"The MCP Grafana server previously relied on static API keys or basic auth for authenticating requests to Grafana. This works fine for local development or single-user setups, but falls apart once you have multiple users who each need their own Grafana permissions. Passing around shared API keys is a security concern and means everyone operates with the same access level regardless of their actual role.\nThe solution is to integrate OAuth/SSO so that each user authenticates with their own identity, and the MCP server forwards their access token to Grafana. Grafana already supports JWT auth, so the tokens issued by an OIDC provider can be validated directly by Grafana to determine the user\u0026rsquo;s role.\nArchitecture # The flow works like this:\nA client (e.g. Claude Code) obtains an OAuth access token from the identity provider (Keycloak in the dev setup) The client sends requests to the MCP Grafana server with Authorization: Bearer \u0026lt;token\u0026gt; The MCP server validates the token against the OIDC provider\u0026rsquo;s JWKS endpoint If valid, the token is forwarded to Grafana as the API key Grafana validates the JWT independently and maps the user to the correct org role based on claims This means the MCP server never sees or stores credentials. It acts as a transparent relay that validates tokens and passes them through.\nOAuth Middleware # The core of the implementation is an HTTP middleware that intercepts requests before they reach the MCP handler. It uses github.com/coreos/go-oidc/v3 to perform OIDC discovery and JWT verification.\nfunc OAuthProtectMiddleware(cfg OAuthServerConfig) (func(http.Handler) http.Handler, error) { ctx := context.Background() provider, err := oidc.NewProvider(ctx, cfg.Issuer) if err != nil { return nil, fmt.Errorf(\u0026#34;oauth: failed to discover OIDC provider at %s: %w\u0026#34;, cfg.Issuer, err) } verifierConfig := \u0026amp;oidc.Config{ SkipClientIDCheck: cfg.Audience == \u0026#34;\u0026#34;, } if cfg.Audience != \u0026#34;\u0026#34; { verifierConfig.ClientID = cfg.Audience } verifier := provider.Verifier(verifierConfig) return func(next http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { rawToken := extractBearerToken(r) if rawToken == \u0026#34;\u0026#34; { w.Header().Set(\u0026#34;WWW-Authenticate\u0026#34;, `Bearer`) http.Error(w, \u0026#34;authorization required\u0026#34;, http.StatusUnauthorized) return } idToken, err := verifier.Verify(r.Context(), rawToken) if err != nil { w.Header().Set(\u0026#34;WWW-Authenticate\u0026#34;, `Bearer error=\u0026#34;invalid_token\u0026#34;`) http.Error(w, \u0026#34;invalid or expired token\u0026#34;, http.StatusUnauthorized) return } // Extract claims, check scopes, build user info, forward to next handler // ... }) }, nil } The provider discovery happens once at startup. The middleware fetches the JWKS from the issuer\u0026rsquo;s /.well-known/openid-configuration endpoint and caches the keys for subsequent validations.\nToken Forwarding to Grafana # Once the middleware validates a token, it stores the user info (including the raw token) in the request context. A bridge function then picks up this token and sets it as the Grafana API key so that downstream client creation uses it.\nfunc OAuthTokenForwardContextFunc(ctx context.Context, _ *http.Request) context.Context { userInfo, ok := OAuthUserInfoFromContext(ctx) if !ok || userInfo.Token == \u0026#34;\u0026#34; { return ctx } config := GrafanaConfigFromContext(ctx) config.APIKey = userInfo.Token return WithGrafanaConfig(ctx, config) } This is inserted into the composed context function chain between header extraction and client creation. It means if a user authenticates via OAuth, their token takes precedence over any env-based API key.\nRFC 9728 Discovery # MCP clients need a way to discover that the server requires OAuth and where to obtain tokens. The server exposes a /.well-known/oauth-protected-resource endpoint following RFC 9728 that returns metadata pointing to the authorization server.\n{ \u0026#34;resource\u0026#34;: \u0026#34;http://localhost:8000/mcp\u0026#34;, \u0026#34;authorization_servers\u0026#34;: [\u0026#34;http://keycloak:8080/realms/grafana\u0026#34;] } Docker Compose Dev Environment # To test the full flow end-to-end locally, I created a Docker Compose setup under dev/ with three services:\nKeycloak — the identity provider, pre-configured with a grafana realm containing test users with different roles (admin, editor, viewer) Grafana — configured with both Generic OAuth (for browser login) and JWT auth (for API token validation) MCP Grafana — the MCP server with OAuth enabled, pointing at Keycloak as the issuer The Keycloak realm is imported from a JSON file that defines:\nA grafana client (confidential) for Grafana\u0026rsquo;s browser-based OAuth login An mcp-claude client (public, PKCE) for CLI tools to obtain tokens via authorization code flow Three test users: alice (admin), bob (editor), carol (viewer) Role mappings that Grafana uses to determine org roles from the roles claim Grafana JWT Auth Configuration # The tricky part is getting Grafana to accept the tokens that Keycloak issues. Grafana\u0026rsquo;s JWT auth needs the JWKS keys to verify signatures. Since Keycloak generates keys on startup, the Grafana container runs an init script that fetches the JWKS from Keycloak before starting Grafana.\n#!/bin/sh KEYCLOAK_URL=\u0026#34;${KEYCLOAK_URL:-http://keycloak:8080}\u0026#34; REALM=\u0026#34;${KEYCLOAK_REALM:-grafana}\u0026#34; echo \u0026#34;Waiting for Keycloak OIDC endpoint...\u0026#34; until wget -q -O /dev/null \u0026#34;${KEYCLOAK_URL}/realms/${REALM}/.well-known/openid-configuration\u0026#34; 2\u0026gt;/dev/null; do sleep 2 done echo \u0026#34;Fetching JWKS from Keycloak...\u0026#34; wget -q -O /etc/grafana/jwks.json \u0026#34;${KEYCLOAK_URL}/realms/${REALM}/protocol/openid-connect/certs\u0026#34; exec /run.sh The GF_AUTH_JWT_JWK_SET_FILE env var points Grafana at this downloaded JWKS file.\nRunning It # cd dev cp .env.example .env docker compose up -d After services are healthy, Keycloak is at localhost:8080, Grafana at localhost:3000, and the MCP server at localhost:8000. You can obtain a token from Keycloak using the password grant for testing:\ncurl -s -X POST http://localhost:8080/realms/grafana/protocol/openid-connect/token \\ -d \u0026#34;grant_type=password\u0026#34; \\ -d \u0026#34;client_id=mcp-claude\u0026#34; \\ -d \u0026#34;username=alice\u0026#34; \\ -d \u0026#34;password=alice123\u0026#34; \\ -d \u0026#34;scope=openid email profile\u0026#34; | jq -r \u0026#39;.access_token\u0026#39; Then use that token against the MCP server:\nTOKEN=$(curl -s ... | jq -r \u0026#39;.access_token\u0026#39;) curl -H \u0026#34;Authorization: Bearer $TOKEN\u0026#34; http://localhost:8000/mcp Server Flags # The OAuth support is opt-in via command line flags:\n--oauth-enabled — enable OAuth token validation --oauth-issuer — OIDC issuer URL for discovery --oauth-audience — expected audience claim (optional) --oauth-scopes-required — comma-separated required scopes (optional) --oauth-username-claim — which claim to extract as username (defaults to email) Without --oauth-enabled, the server behaves exactly as before. The same configuration can also be provided via environment variables (OAUTH_ENABLED, OAUTH_ISSUER, etc.).\nWhat\u0026rsquo;s Next # The current implementation covers the resource server side — validating tokens that clients have already obtained. The next step would be implementing the full MCP OAuth 2.1 client flow so that tools like Claude Code can automatically handle the authorization code + PKCE flow when they encounter a protected MCP server.\n","date":"25 May 2026","externalUrl":null,"permalink":"/adding-sso-to-mcp-grafana-server/","section":"Posts","summary":"The MCP Grafana server previously relied on static API keys or basic auth for authenticating requests to Grafana. This works fine for local development or single-user setups, but falls apart once you have multiple users who each need their own Grafana permissions. Passing around shared API keys is a security concern and means everyone operates with the same access level regardless of their actual role.\n","title":"Adding SSO to MCP Grafana Server","type":"posts"},{"content":"","date":"25 May 2026","externalUrl":null,"permalink":"/categories/golang/","section":"Article Categories","summary":"","title":"Golang","type":"categories"},{"content":"","date":"25 May 2026","externalUrl":null,"permalink":"/tags/golang/","section":"Technology Tags","summary":"","title":"Golang","type":"tags"},{"content":"","date":"25 May 2026","externalUrl":null,"permalink":"/categories/grafana/","section":"Article Categories","summary":"","title":"Grafana","type":"categories"},{"content":"","date":"25 May 2026","externalUrl":null,"permalink":"/tags/grafana/","section":"Technology Tags","summary":"","title":"Grafana","type":"tags"},{"content":"","date":"15 May 2026","externalUrl":null,"permalink":"/tags/productivity/","section":"Technology Tags","summary":"","title":"Productivity","type":"tags"},{"content":"I spend most of my working day jumping between a terminal, a browser, and an editor. Usually multiple instances of each, spread across different virtual desktops or hidden behind other windows. Every time I switch context between projects, there is a cost. I have to remember which terminal is running which service, which browser tab has the right dashboard, and which editor window has the file I was working on. The state is scattered.\nI wanted a single surface where one tab equals one project, and that tab contains everything I need: terminals, web panels, and file editors side by side. So I built Worklayer.\nThe Problem with Existing Tools # Window managers solve part of this. Tiling WMs like i3 or Hyprland keep windows organized spatially. But they operate at the OS level and treat every application as an opaque rectangle. They cannot group a terminal, a browser tab, and a file editor into a single switchable unit.\nIDE integrated terminals and browsers get closer, but they are locked into the IDE\u0026rsquo;s ecosystem. I wanted something that works with any web app, any shell, and any file, without being tied to VS Code\u0026rsquo;s extension model or IntelliJ\u0026rsquo;s project structure.\nOne Tab, Everything Together # Worklayer\u0026rsquo;s core concept is the workspace. A workspace is a named group of panels. Each panel is one of three types:\nTerminal panels — persistent shell sessions powered by xterm.js and node-pty. They survive workspace switches without losing scrollback. Web panels — embedded Chromium webviews with full navigation. Dashboards, documentation, pull request reviews, anything with a URL. File panels — a file browser plus Monaco editor with syntax highlighting and LSP support. Switching workspaces switches all panels at once. The mental model is simple: one workspace per unit of work. A debugging session might have a terminal running logs, a web panel showing Grafana, and a file panel open to the relevant source. When I switch to a different task, all three disappear together and the new task\u0026rsquo;s panels appear.\nScrollable Tiling from Niri # The layout inside a workspace is horizontally scrollable, inspired by Niri, a scrollable tiling Wayland compositor. Instead of forcing panels into a fixed grid that gets cramped as you add more, panels extend horizontally and you scroll to reach them. This means you can have as many panels as you need without any of them shrinking to unusable sizes.\nThe resize UX took several iterations to get right:\nDrag handles between panels let you adjust widths. They are deliberately wider than a typical 1px border so they are easy to grab. requestAnimationFrame throttling on resize events prevents layout thrashing. Without this, dragging a handle would fire hundreds of resize events per second and the UI would stutter. Double-click to expand — double-clicking a drag handle expands that panel to 2x its current width. Useful when you need to temporarily focus on one panel. Auto-scroll near edges — when dragging a handle near the left or right edge of the viewport, the panel strip automatically scrolls in that direction. This makes it possible to resize panels that are partially off-screen. Templates and Profiles # Once I had workspaces working well, I found myself recreating the same layouts repeatedly. Every time I started working on a specific microservice, I would create the same three panels with the same working directories and URLs.\nTemplates solve this. You save a workspace configuration as a template, and next time you can instantiate it in one click. The template preserves panel types, order, working directories, startup commands, and URLs.\nProfiles take this further by providing isolation boundaries. Each profile has its own set of workspaces, templates, and URL history. I use separate profiles for different teams or contexts, so work-related URLs and layouts do not leak into personal projects.\nWhat This Enables # The real payoff is not any single feature but the elimination of the constant low-level question: \u0026ldquo;where did I put that?\u0026rdquo; When everything for a task lives in one workspace, returning to that task after an interruption is instant. There is no archaeology of finding the right terminal among fifteen tabs.\nIt also changes how I interact with AI coding tools. Because Worklayer has a built-in MCP server, Claude Code running in a terminal panel can control an adjacent web panel directly. The AI can navigate, click, and screenshot without leaving the app. But that is a topic for a future post.\nThe source code is an Electron app with vanilla JavaScript, no React or framework overhead. The main dependencies are xterm.js for terminals, Monaco for the editor, and node-pty for native pseudoterminals. It builds to a macOS DMG targeting Apple Silicon.\n","date":"15 May 2026","externalUrl":null,"permalink":"/why-i-built-a-workspace-focused-electron-app/","section":"Posts","summary":"I spend most of my working day jumping between a terminal, a browser, and an editor. Usually multiple instances of each, spread across different virtual desktops or hidden behind other windows. Every time I switch context between projects, there is a cost. I have to remember which terminal is running which service, which browser tab has the right dashboard, and which editor window has the file I was working on. The state is scattered.\n","title":"Why I Built a Workspace-Focused Electron App","type":"posts"},{"content":"","date":"22 March 2026","externalUrl":null,"permalink":"/categories/devops/","section":"Article Categories","summary":"","title":"Devops","type":"categories"},{"content":"","date":"22 March 2026","externalUrl":null,"permalink":"/tags/dora/","section":"Technology Tags","summary":"","title":"Dora","type":"tags"},{"content":"Most online content regarding AI coding tools focuses heavily on input and output token counts. While these metrics are useful for understanding the raw volume of data processed, they often fail to address the actual effectiveness of those tokens in solving real-world engineering problems. Measuring the true impact of these tools on development workflows remains a challenge because volume does not equate to value.\nUnderstanding DORA Metrics # To measure engineering effectiveness, many organizations turn to DORA metrics (DevOps Research and Assessment). These are four key indicators that have become the industry standard for measuring software development and delivery performance:\nDeployment Frequency: How often your organization successfully releases to production. Lead Time for Changes: The time it takes for a commit to reach production. Change Failure Rate: The percentage of deployments causing a failure in production. Failed Service Recovery Time: How long it takes to restore service when a production failure occurs. While DORA metrics provide a high-level view of team performance and stability, they don\u0026rsquo;t explicitly account for the cost or efficiency of the tools used to achieve those results.\nMoving Beyond Raw Tokens: Attributed Usage # One potential approach to bridge this gap is to combine DORA metrics with a new measure: attributed token usage linked directly to issue trackers.\nCurrently, we know how many tokens we use globally, but we rarely know what they were used for. By attributing token consumption to specific Jira tasks, GitHub issues, or pull requests, we can begin to see the \u0026ldquo;cost of completion\u0026rdquo; for different types of work.\nFor example, we could track how many tokens (and their associated cost) are consumed to resolve a specific bug or implement a feature. If a particular bug costs $6-$8 in tokens to solve, that provides a tangible data point. This isn\u0026rsquo;t just about the financial cost; it\u0026rsquo;s about the \u0026ldquo;cognitive load\u0026rdquo; the AI is carrying to understand and solve that specific problem.\nCost as a Signal for Investigation # On a higher level, this attribution allows teams to identify if certain types of issues or specific parts of the codebase are incurring disproportionately high costs.\nComplex Legacy Code: If a relatively simple bug in a legacy module requires $20 of tokens to \u0026ldquo;explain\u0026rdquo; the context to the AI, it\u0026rsquo;s a strong signal that the code is too complex and might be a candidate for refactoring. Poorly Defined Requirements: A high token-to-resolution ratio on new features might indicate that the requirements are ambiguous, leading the AI (and the developer) to iterate through many failed attempts. Tool Inefficiency: It helps us evaluate whether a specific AI tool or model is actually effective for our specific tech stack compared to its cost. While this won\u0026rsquo;t be a perfectly accurate measure of complexity—AI costs fluctuate and models evolve—it serves as a valuable signal. If an issue\u0026rsquo;s token cost spikes, it\u0026rsquo;s a prompt for a human lead to investigate whether the tool is struggling with that specific context or if the problem itself is fundamentally flawed. By layering these cost insights over DORA\u0026rsquo;s velocity and stability metrics, we get a much clearer picture of our true engineering efficiency.\n","date":"22 March 2026","externalUrl":null,"permalink":"/measuring-coding-tool-effectiveness/","section":"Posts","summary":"Most online content regarding AI coding tools focuses heavily on input and output token counts. While these metrics are useful for understanding the raw volume of data processed, they often fail to address the actual effectiveness of those tokens in solving real-world engineering problems. Measuring the true impact of these tools on development workflows remains a challenge because volume does not equate to value.\n","title":"Measuring Coding Tool Effectiveness","type":"posts"},{"content":"","date":"22 March 2026","externalUrl":null,"permalink":"/tags/metrics/","section":"Technology Tags","summary":"","title":"Metrics","type":"tags"},{"content":"I\u0026rsquo;ve been wanting to reduce the amount of typing I do on a daily basis. Between writing messages, emails, and documentation - there\u0026rsquo;s a lot of text to produce. macOS does have a built-in dictation feature but I had concerns about its accuracy - it relies on a model running locally on the Mac, and I figured a cloud-based speech-to-text service would produce better results, especially for technical jargon and longer dictation sessions. On top of that, the built-in dictation doesn\u0026rsquo;t clean up filler words, and I wanted something that I could customize to my own needs. So I decided to build my own dictation app.\nThe idea # The concept is simple: press a hotkey to start recording, press it again to stop, and have the transcribed text injected directly into whatever app I\u0026rsquo;m currently using. No need to switch windows, no copy-pasting - just speak and the text appears where my cursor is.\nThe app runs as an accessory app (no Dock icon) with a floating overlay at the bottom of the screen that shows a waveform animation while recording and the transcription result when done.\nTech stack choices # I went with Swift and Swift Package Manager for this. Since this is a macOS-only app that needs deep integration with system-level features like global hotkeys and clipboard access, Swift felt like the natural choice. The app uses:\nAVFoundation for microphone capture (mono Int16 PCM audio) CoreGraphics for global hotkey interception via CGEventTap AppKit for the floating overlay panel with a custom waveform view AWS Transcribe Streaming for the actual speech-to-text AWS Bedrock (Claude Haiku) for cleaning up the raw transcription The global hotkey problem # One of the trickier parts was setting up the global hotkey. macOS requires CGEventTapCreate to intercept keyboard events globally, which means the app needs Input Monitoring permission. This is a system-level permission that the user has to grant manually through System Settings.\nThe hotkey I chose is Ctrl+Option+K - uncommon enough to not conflict with other shortcuts. I also had to add debouncing (300ms) because CGEventTap can fire multiple times for a single key press, which would cause the app to toggle recording on and off immediately.\nAudio capture and the TCC rabbit hole # Getting the microphone to work was surprisingly frustrating. macOS controls microphone access through TCC (Transparency, Consent, and Control), and the permission state can become stale. I ran into a situation where I had granted microphone permission in System Settings, but AVAudioEngine would hang for about 10 seconds and then fail with kAudioHardwareNotRunningError.\nThe fix? Restart the computer. macOS caches TCC permission state and sometimes the change doesn\u0026rsquo;t take effect until after a full reboot. This was not obvious at all and took a while to figure out.\nAnother thing I learned: for CLI apps built with SPM, the TCC permission is tied to the terminal application (e.g. Terminal, iTerm2), not the built binary itself. Embedding an Info.plist doesn\u0026rsquo;t help - you need to grant permission to whatever terminal you\u0026rsquo;re running the app from.\nThe transcription pipeline # The pipeline has two stages:\nAWS Transcribe Streaming - Takes the raw PCM audio and produces text. The audio is streamed in 16KB chunks via an AsyncThrowingStream. This works well because it means we don\u0026rsquo;t need to wait for the entire audio to upload before transcription starts.\nAWS Bedrock (Claude Haiku) - Takes the raw transcription and cleans it up. The raw output from speech-to-text often includes filler words like \u0026ldquo;umm\u0026rdquo;, \u0026ldquo;uh\u0026rdquo;, \u0026ldquo;like\u0026rdquo;, \u0026ldquo;you know\u0026rdquo; and has imperfect punctuation. The Bedrock call uses a system prompt that tells it to act as a \u0026ldquo;dumb text formatter\u0026rdquo; - only removing fillers and fixing grammar without changing the meaning.\nThis two-stage approach works quite well. The raw transcription is already decent from AWS Transcribe, and the cleanup pass from Haiku makes it read much more naturally.\nText injection # Once we have the cleaned text, we need to get it into whatever app the user was typing in. The approach is straightforward: copy the text to the clipboard via NSPasteboard, then simulate a Cmd+V paste using CGEvent. There\u0026rsquo;s a small 50ms delay between the clipboard write and the simulated paste to make sure the clipboard is ready.\nThis clipboard-based approach is a bit of a hack but it\u0026rsquo;s reliable and works across all applications. The downside is that it overwrites whatever was previously on the clipboard.\nState management # The app uses a simple state machine with four states: idle, starting, recording, and transcribing. This prevents issues like trying to start a new recording while one is already in progress, or pressing the hotkey while transcription is happening.\nThere\u0026rsquo;s also a 30-second safety timeout on the transcription step - if AWS Transcribe or Bedrock hangs for whatever reason, the overlay gets force-hidden and the app returns to idle. This prevents the app from getting stuck in a broken state.\nLearnings # A few things I picked up from this project:\nmacOS permissions are painful for CLI apps - TCC, Input Monitoring, and Accessibility permissions are all designed around .app bundles, not SPM executables run from a terminal. Expect to spend time debugging permission issues. Speech-to-text output benefits from a cleanup pass - Raw transcription is functional but messy. Running it through an LLM to clean up fillers and punctuation makes a big difference in quality for minimal cost (Haiku is cheap). Global hotkeys need debouncing - CGEventTap can fire multiple events for what feels like a single key press. Without debouncing, the app would toggle on and off immediately. Reboot after granting TCC permissions - If audio capture fails after granting microphone access, try restarting. The permission cache is real and will waste hours of your time if you don\u0026rsquo;t know about it. The source code is available on my GitHub if anyone wants to take a look or build something similar.\n","date":"15 March 2026","externalUrl":null,"permalink":"/building-a-dictation-app-with-swift/","section":"Posts","summary":"I’ve been wanting to reduce the amount of typing I do on a daily basis. Between writing messages, emails, and documentation - there’s a lot of text to produce. macOS does have a built-in dictation feature but I had concerns about its accuracy - it relies on a model running locally on the Mac, and I figured a cloud-based speech-to-text service would produce better results, especially for technical jargon and longer dictation sessions. On top of that, the built-in dictation doesn’t clean up filler words, and I wanted something that I could customize to my own needs. So I decided to build my own dictation app.\n","title":"Building a dictation app with Swift","type":"posts"},{"content":"","date":"15 March 2026","externalUrl":null,"permalink":"/categories/swift/","section":"Article Categories","summary":"","title":"Swift","type":"categories"},{"content":"","date":"15 March 2026","externalUrl":null,"permalink":"/tags/swift/","section":"Technology Tags","summary":"","title":"Swift","type":"tags"},{"content":"","date":"1 February 2026","externalUrl":null,"permalink":"/categories/automation/","section":"Article Categories","summary":"","title":"Automation","type":"categories"},{"content":"","date":"1 February 2026","externalUrl":null,"permalink":"/tags/automation/","section":"Technology Tags","summary":"","title":"Automation","type":"tags"},{"content":"I\u0026rsquo;ve tried to build a bunch of AI agents at work for a variety of purpose and with that, learnt a couple of interesting properties out of it:\nShould it even be an agent? # Agents are definitely an exciting piece of tech and various media outlets and blogs make it seem like its the silver bullet to solve almost everything. However, as with all things in tech - this as with all the supposedly proclaimed \u0026ldquo;silver bullets\u0026rdquo; - building agents is not the silver bullet that people think it is. At the end of the day, whether one should implement it as a agent depends on the problem set.\nThe most important factor to take note is this: Do we need an absolute deterministic result at the end of the process. If yes, then it\u0026rsquo;s best to use plain old programming languages and write a script for that. One can use AI to generate said script but with a script - we will have a deterministtic result (i\u0026rsquo;m ignore side effect that could happen when running the scripts - e.g. timing issues etc - that\u0026rsquo;ll result in flaky results but its still somewhat deterministic at the end of day)\nIf the problem is more probalistic in nature - e.g. debugging a issue where there could be multiple issues and solutions, having and building an agent could help with this. The agent could be tasked to dig through the various pieces of information and with the right set of instructions, it could be used to summarize the results in a useful manner for an engineer to review. It serves well as the first pass for debugging an issue.\nWhere will this thing run? Who will run it? # If the agent thing will be run by a human, maybe can consider to just run use just use the coding harness such as Open Code or Claude Code. The coding agents is able do a whole variety of activies such as calling endpoints or running scripts (assuming you\u0026rsquo;re giving it permission).\nThere is a benefit to make it accessible from Claude Code - we can combine multi workflows into one and this combination of instructions makes the tool interesting and powerful.\n","date":"1 February 2026","externalUrl":null,"permalink":"/learnings-from-building-agents/","section":"Posts","summary":"I’ve tried to build a bunch of AI agents at work for a variety of purpose and with that, learnt a couple of interesting properties out of it:\nShould it even be an agent? # Agents are definitely an exciting piece of tech and various media outlets and blogs make it seem like its the silver bullet to solve almost everything. However, as with all things in tech - this as with all the supposedly proclaimed “silver bullets” - building agents is not the silver bullet that people think it is. At the end of the day, whether one should implement it as a agent depends on the problem set.\n","title":"Learnings from building agents","type":"posts"},{"content":"Here are some of the learnings for using Claude Code. This will be a running document of learnings as we go along for the ride of using this tool\nLast update: 26 January 2026\nUtilizing Claude Plugins/Skills/Commands # There are common prompts being used over and over again. One common prompt that I commonly use on a day to day basis - e.g. \u0026ldquo;Commit the changes that has been done so far with summary of changes as the commit message and push it remote\u0026rdquo;. This prompt is quite long and becomes a hassle to type over and over again. However, once we create embed such things as a slash command - it becomes trivial to simply recall this prompt. And a even nicer fact is that now, since a slash command is usually a markdown file - we can provide even more details and context of how the slash command would operate (although larger prompts would naturally take up room in the context window - too big is also too bad)\nAn interesting thing that was mentioned was that one can actually utilize the slash commands midway in a prompt as well. An example would be:\n/commit2remote - a slash command to commit changes with a summary of changes as commit message and push to remote /run-linter - a slash command to run variosu linter checks So an example prompt could be: \u0026ldquo;Run /run-linter and then if there are no issues, /commit2remote\u0026rdquo; - but in general, i don\u0026rsquo;t exactly do this\nNowadays, my main 2 slash commands:\n/issue2code - Takes github issue and description. Read it and go into planning mode to try to implement it as code /commit-pr - Push the code with summary of changes as commit message and push to remote and then create MR from it Updating the CLAUDE.md consistently # CLAUDE.md is the main file that we would use for understanding a particular codebase. As the codebase evolves, naturally, we should update the CLAUDE.md so that the model being used would be able to understand the codebase correctly without us consistently telling it that it\u0026rsquo;s doing something wrong etc. It can immediately start with correct understanding and standards.\nAn example could be a situation where a python codebase started out without any typing information as parameters in functions. However, let\u0026rsquo;s say we have done tasks to introduce typing across the codebase - if the CLAUDE.md doesn\u0026rsquo;t have this explicitly - then it could have a chance to generate functions that might not fit that coding standards. If this was done more explicitly - it should know and will be conform in a better way\nTry Planning mode # This is an inspiration from this video:: https://www.youtube.com/watch?v=B-UXpneKw6M\u0026pp=ygUUYm9yaXMgY2hlcm55IGFpIGxhYnPYBvwC\nUse planning mode when trying an implementation. Reason for why planning produces pretty good output is due to the exploratory steps being run before hand. The explore steps help to build up the context which can then be used to draft a deeper through plan. A nice part is that at the end of planning mode, there is an option to clear context and ask the model to one shot the implementation based on the models\nEstablish feedback loops # Claude model can easily write up the code at one go but sometimes - how would we know that the output works? This is where we can supercharge the process and remove the manual parts (especially QA parts)\nThe easiest straight forward path is to have integration tests - and also mention it in CLAUDE.md. Once the model knows this - it will automatically know to run this and ensure that the integration tests passed. First it\u0026rsquo;ll implement the code, then it\u0026rsquo;ll run the integration tests. Once all tests passed, then the model will declare that the task is done.\nOther examples of feedback loops:\nUI changes by either have playwrite UI tests. Or just give the model playwright mcp server Creation of Jenkins jobs and have the model run the newly created jenkins job until it succeeds Introducing abstractions # AI models is not cheap. If there is a somewhat solved problem - we can introduce abstractions so that we can reduce the amount of tokens being used just to do fetching of such data.\nE.g. Let\u0026rsquo;s say if we have a huge dataset to process. We shouldn\u0026rsquo;t dump the entire dataset into the model - that\u0026rsquo;ll just be a waste of money in terms of input and output. The answer we\u0026rsquo;re getting also doesn\u0026rsquo;t have a good chance of being correct as well - models are not known to do math very well (remember the count the number of r in strawberry). Instead, what we can do is to have the model write script that can do such tasks - we\u0026rsquo;ll get an intermediate output which we can inspect - and if that intermediate output is good, we can simply use it to get the result we want. - a concrete example would be to have the model to generate a sql script which we can then use to query the dataset.\nAnother E.g. Let\u0026rsquo;s say if we have a large log to collect. The log is accessible via some particular endpoint which could have been provided in the CLAUDE.md. If we let the model to collect the log on its own - it might proceed to curl and then pull the log and the entire log would easily end up in model context. If we alter the angle to have the model call a particular function that is standardized to retrieve the log that could be written as a file - that could be a better approach? And with that approach - there is no need for the model to try guessing how to receive the log\nSend image instead of describing the problem # Inspiration from this: https://www.youtube.com/watch?v=M8kZLuukZgk\nApparently, Claude understands images quite well - but it can\u0026rsquo;t generate images unfortunately. We can do things like taking screenshots or drawings and then pass it to the model. (Reference the feedback loop sections with playwright - some of the operations it might do is to take screenshot to confirm that the task is done or to understand context of the problem)\nRunning multiple claude code runs at one go # This one is also an inspiration from this video: https://www.youtube.com/watch?v=B-UXpneKw6M\u0026pp=ygUUYm9yaXMgY2hlcm55IGFpIGxhYnPYBvwC\nWe can setup a 4 screen iterm on mac os to run 4 different process run at one go\nThis allows faster code or output generation but it show bottlenecks in different place - which in this case - that would be at the pull request\nKeep Claude Code coding harness updated consistently # Claude code improves at high speed over the past few weeks. Some of the latest and greatest feature simply get introduce very recently (beginning of this year). Some interesting examples of such concepts would be:\nSkills (A differentiated approach to mcp tooling - progressive disclosure) Task management Swarms I\u0026rsquo;m still struggling to keep up with all the things that is happening in the market\nSpec-driven development vs one shot vs breaking up tasks for agents # There are numerous ways to work with all the AI tooling.\nSpec driven development - essentially, do a Product Management task to develop in-depth requirements document to cover the various features and edge cases of each feature. The entire document crafted can then be passed to AI tooling and potentially, an entire swarm of AI agents can cooperate to work on it. One shot prompt - the ones that sometimes big companies market about constantnly - e.g. Anthropic on creating C compiler or Cloudflare creating a JS runtime based off Next.js Breaking up tasks for agents and passing one small task to agent one at a time I lean more on the last option of the way for developing tools/products - the first option involves too much work to decide on the various aspects of the product - which could potentially go wrong as implementation starts (e.g library not available to support the feature?) - not 100% coverage while implementation is done on spec - too big of a massive change to review for a human. Second option feels like you\u0026rsquo;re gambling - you\u0026rsquo;re simply relying on the AI to plan and research - it does get better but highly likely, there will be various assumption that one does not agree on during implementation.\n","date":"25 January 2026","externalUrl":null,"permalink":"/learnings-from-using-claude-code/","section":"Posts","summary":"Here are some of the learnings for using Claude Code. This will be a running document of learnings as we go along for the ride of using this tool\nLast update: 26 January 2026\n","title":"Learnings from using Claude Code","type":"posts"},{"content":"","date":"25 August 2025","externalUrl":null,"permalink":"/tags/devops/","section":"Technology Tags","summary":"","title":"Devops","type":"tags"},{"content":"","date":"25 August 2025","externalUrl":null,"permalink":"/categories/jenkins/","section":"Article Categories","summary":"","title":"Jenkins","type":"categories"},{"content":"","date":"25 August 2025","externalUrl":null,"permalink":"/tags/jenkins/","section":"Technology Tags","summary":"","title":"Jenkins","type":"tags"},{"content":"It is pretty important to understand how our jenkins job is running. We can technically keep querying the jenkins server via jenkins API but that would mean trying to parse the every changing response - which could be quite a painful process to go through. Instead, what we can do is to simply install 2 plugins - metrics and prometheus jenkins plugins.\nI have a small setup to demonstrate this with a jenkins setup that will setup the following in a docker-compose setup\nJenkins master in a container Jenkins agent in a container Prometheus (to collect metrics) Grafana to vizualize that data from the prometheus Reference to the setup is here: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Environment/jenkins\nImportant step 1: Install required plugins # There are a few critical steps just for doing the monitoring of jenkins via promethues. The first would be install the metrics and prometheus plugins. Technically we can do this via Jenkins UI but with the setup mentioned above, we can do it \u0026ldquo;automatically\u0026rdquo; - we can define it in the plugins.txt mentioned in the following file in the above repo: https://github.com/hairizuanbinnoorazman/Go_Programming/blob/master/Environment/jenkins/plugins.txt\nWith that, we would install it during docker build step and it would be available on next start of jenkins master and slave servers.\nImportant step 2: Querying of prometheus data # The prometheus data is possible to be queried by querying the \u0026lt;host\u0026gt;:\u0026lt;port\u0026gt;/prometheus/ endpoint (but its possible to configure it differently as well). Refer to the following plugin page: https://plugins.jenkins.io/prometheus/\nFor prometheus, we can set the configuration of the prometheus with static configuration. Since we\u0026rsquo;re doing the above setup via docker compose setup - we can see that the jenkins master server can be reached and pinged with jenkins hostname.\nglobal: scrape_interval: 1m evaluation_interval: 1m # A list of scrape configurations. scrape_configs: # The job name is added as a label `job=\u0026lt;job_name\u0026gt;` to all metrics scraped from this config. - job_name: \u0026#39;jenkins\u0026#39; static_configs: - targets: [\u0026#39;jenkins:8080\u0026#39;] metrics_path: \u0026#34;/prometheus\u0026#34; The jenkins server is exposed via port 8080. For the metrics, it is exposed on /prometheus instead of the usual /metrics path.\nTechnically this is enough to get something started with data collection. Actual vizualization of the metrics collection on grafana\nImportant step 3: Viewing of jenkins data on grafana # We can then hook the grafana setup to the prometheus server. To check that metrics are collected correctly. Once the metrics collected, we can then use the following dashboard: https://grafana.com/grafana/dashboards/9964-jenkins-performance-and-health-overview/ to try to get something going.\nThis could be a good promql that we can use to get average duration of job:\nincrease(default_jenkins_builds_duration_milliseconds_summary_sum{jenkins_job=\u0026#34;firstjob\u0026#34;}[5m])/increase(default_jenkins_builds_duration_milliseconds_summary_count{jenkins_job=\u0026#34;firstjob\u0026#34;}[5m])/1000 ","date":"25 August 2025","externalUrl":null,"permalink":"/monitoring-jenkins-via-prometheus/","section":"Posts","summary":"It is pretty important to understand how our jenkins job is running. We can technically keep querying the jenkins server via jenkins API but that would mean trying to parse the every changing response - which could be quite a painful process to go through. Instead, what we can do is to simply install 2 plugins - metrics and prometheus jenkins plugins.\n","title":"Monitoring Jenkins via Prometheus","type":"posts"},{"content":"","date":"20 August 2025","externalUrl":null,"permalink":"/categories/google-cloud/","section":"Article Categories","summary":"","title":"Google Cloud","type":"categories"},{"content":"","date":"20 August 2025","externalUrl":null,"permalink":"/tags/google-cloud/","section":"Technology Tags","summary":"","title":"Google Cloud","type":"tags"},{"content":"I need to deploy a metrics exporter to check for nodes on instances and push it into a grafana metrics dashboard\nWe can demonstrate this with 2 instances\nDeploy alloy to collect Node Metrics # We would first install alloy of the instance we would want to monitor. Here are the reference for it: https://grafana.com/docs/alloy/latest/set-up/install/linux/\nsudo apt install gpg sudo mkdir -p /etc/apt/keyrings/ wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg \u0026gt; /dev/null echo \u0026#34;deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main\u0026#34; | sudo tee /etc/apt/sources.list.d/grafana.list sudo apt-get update sudo apt-get install alloy sudo systemctl enable alloy sudo systemctl start alloy We would need to reconfigure alloy configuration: /etc/alloy/config.alloy\n// Sample config for Alloy. // // For a full configuration reference, see https://grafana.com/docs/alloy logging { level = \u0026#34;info\u0026#34; } prometheus.exporter.unix \u0026#34;default\u0026#34; { include_exporter_metrics = true disable_collectors = [\u0026#34;mdadm\u0026#34;] } prometheus.scrape \u0026#34;default\u0026#34; { targets = array.concat( prometheus.exporter.unix.default.targets, [{ // Self-collect metrics job = \u0026#34;alloy\u0026#34;, __address__ = \u0026#34;127.0.0.1:12345\u0026#34;, }], ) forward_to = [ prometheus.remote_write.default.receiver, ] } prometheus.remote_write \u0026#34;default\u0026#34; { endpoint { url = \u0026#34;http://10.X.X.X:9090/api/v1/write\u0026#34; } } Deploy prometheus and grafana on Second instance # This is to install grafana\nsudo apt-get install -y apt-transport-https software-properties-common wget sudo mkdir -p /etc/apt/keyrings/ wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg \u0026gt; /dev/null echo \u0026#34;deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main\u0026#34; | sudo tee -a /etc/apt/sources.list.d/grafana.list # Updates the list of available packages sudo apt-get update # Installs the latest OSS release: sudo apt-get install grafana sudo systemctl enable grafana sudo systemctl start grafana This is to install prometheus\nsudo useradd -M -U prometheus wget https://github.com/prometheus/prometheus/releases/download/v3.5.0/prometheus-3.5.0.linux-amd64.tar.gz tar -xzvf prometheus-3.5.0.linux-amd64.tar.gz sudo mv prometheus-3.5.0.linux-amd64 /opt/prometheus sudo chown prometheus:prometheus -R /opt/prometheus We then need to create prometheus systemd file in the following file: /etc/systemd/system/prometheus.service.\n[Unit] Description=Prometheus Server Documentation=https://prometheus.io/docs/introduction/overview/ After=network-online.target [Service] User=prometheus Group=prometheus Restart=on-failure ExecStart=/opt/prometheus/prometheus \\ --config.file=/opt/prometheus/prometheus.yml \\ --web.enable-remote-write-receiver \\ --storage.tsdb.path=/opt/prometheus/data \\ --storage.tsdb.retention.time=30d [Install] WantedBy=multi-user.target Take note of the above that for the above prometheus, we would allow it to accept metrics from other sources\nIn order to expose the grafana, we may need to ensure that port 3000 is exposed publicly (we can\u0026rsquo;t exactly easily use port 80 - this would mean grafana would need to be run by root user).\nConclusion # After which, when we start everything, we can then check if everything is setup correctly. We can do so by doing the following:\nLogin in grafana with default credentials (admin / admin) Add the node exporter dashboard 1860 - https://grafana.com/grafana/dashboards/1860-node-exporter-full/ Check that the metrics is coming in via the Explore panel (Grafana) ","date":"20 August 2025","externalUrl":null,"permalink":"/using-alloy-and-grafana-for-extracting-metrics-and-pushing-to-dashboard/","section":"Posts","summary":"I need to deploy a metrics exporter to check for nodes on instances and push it into a grafana metrics dashboard\nWe can demonstrate this with 2 instances\nDeploy alloy to collect Node Metrics # We would first install alloy of the instance we would want to monitor. Here are the reference for it: https://grafana.com/docs/alloy/latest/set-up/install/linux/\n","title":"Using Alloy and Grafana for extracting metrics and pushing to dashboard","type":"posts"},{"content":"I have a small engineering problem to resolve which to export logs from an android application and save it into a monitoring stack of sorts. The logs are mostly only for debugging purposes because its a pure pain to try to go chat with the user that holds the phone in order to debug the issue. Technically, I can use tools like sentry that is able to retrieve logs more automatically but that would require a bit more involvement with sending logs more consistently to the cloud. The application as of now generates too much logs over long periods so there is a slight fear that if I enable that, it might take too much bandwidth from the android application. (I guess I also need to mention that the application would be operating with a very limited bandwidth - logs are a nice to have and only used in debugging cases - which is technically not often)\nRight now, I have an idea which is to have the android app to export logs for a time period, zip it and send it over to the server which would then send to my monitoring stack (which is the usual Grafana stack - who doesn\u0026rsquo;t use them)\nThe following blog post is only to do part 1 of this entire endeavour which is to push logs to the monitoring stack - maybe I\u0026rsquo;ll cover in another blog post off a simple code that one can add in a android to zip logs and send it to server for doing such parsing and processing to send it to the monitoring stack.\nHere is a setup that can be used to test this.\nHere is the docker compose setup for it:\nversion: \u0026#34;3.8\u0026#34; services: loki: image: grafana/loki:2.9.2 container_name: loki # It keeps complaining of being unable to mkdir folders due to permissions user: \u0026#34;0\u0026#34; ports: - \u0026#34;3100:3100\u0026#34; command: -config.file=/etc/loki/local-config.yaml volumes: - ./hehe2.yaml:/etc/loki/local-config.yaml:ro - loki-folder:/loki grafana: image: grafana/grafana:10.2.3 container_name: grafana ports: - \u0026#34;3000:3000\u0026#34; depends_on: - loki environment: - GF_SECURITY_ADMIN_USER=admin - GF_SECURITY_ADMIN_PASSWORD=admin volumes: - grafana-data:/var/lib/grafana volumes: grafana-data: loki-folder: I\u0026rsquo;ll need to add the grafana datasource manually for the above setup via UI.\nHere is the loki config file (saved as hehe2.yaml):\nauth_enabled: false server: http_listen_port: 3100 ingester: lifecycler: address: 127.0.0.1 ring: kvstore: store: inmemory replication_factor: 1 final_sleep: 0s chunk_idle_period: 5m chunk_retain_period: 30s max_transfer_retries: 0 schema_config: configs: - from: 2020-10-24 store: boltdb-shipper object_store: filesystem schema: v11 index: prefix: index_ period: 24h storage_config: boltdb_shipper: active_index_directory: /loki/index cache_location: /loki/cache shared_store: filesystem filesystem: directory: /loki/chunks compactor: working_directory: /loki/compactor shared_store: filesystem limits_config: enforce_metric_name: false reject_old_samples: true reject_old_samples_max_age: 168h After the setup above is done, one can run the following curl request:\ncurl -X POST \u0026#34;http://localhost:3100/loki/api/v1/push\u0026#34; \\ -H \u0026#34;Content-Type: application/json\u0026#34; \\ -d \u0026#39;{ \u0026#34;streams\u0026#34;: [ { \u0026#34;stream\u0026#34;: { \u0026#34;job\u0026#34;: \u0026#34;demo\u0026#34;, \u0026#34;app\u0026#34;: \u0026#34;example\u0026#34; }, \u0026#34;values\u0026#34;: [ [\u0026#34;1756655381000000000\u0026#34;, \u0026#34;This is a backfilled log line\u0026#34;] ] } ] }\u0026#39; The timestamp is a nano time stamp\nWe would probably need to modify the timestamp field if we want to reuse the above example; take note that we allowed loki to reject old samples data - we only accept old sample data that is only up to 168hours back.\n","date":"15 August 2025","externalUrl":null,"permalink":"/backfilling-logs-on-loki-grafana-stack/","section":"Posts","summary":"I have a small engineering problem to resolve which to export logs from an android application and save it into a monitoring stack of sorts. The logs are mostly only for debugging purposes because its a pure pain to try to go chat with the user that holds the phone in order to debug the issue. Technically, I can use tools like sentry that is able to retrieve logs more automatically but that would require a bit more involvement with sending logs more consistently to the cloud. The application as of now generates too much logs over long periods so there is a slight fear that if I enable that, it might take too much bandwidth from the android application. (I guess I also need to mention that the application would be operating with a very limited bandwidth - logs are a nice to have and only used in debugging cases - which is technically not often)\n","title":"Backfilling logs on Loki (Grafana Stack)","type":"posts"},{"content":"","date":"15 August 2025","externalUrl":null,"permalink":"/categories/docker/","section":"Article Categories","summary":"","title":"Docker","type":"categories"},{"content":"","date":"15 August 2025","externalUrl":null,"permalink":"/tags/docker/","section":"Technology Tags","summary":"","title":"Docker","type":"tags"},{"content":"One of the major things that I was researching on for security stuff for distributing software is the capability to answer \u0026ldquo;is this software produced from your company\u0026rdquo;? This led me to a rabbit hole for the signing mechanism for containers. The signing mechanism is somewhat similar to us install packages from rpm or deb repos for the various linux repos - there is a need to ensure that the package received is truly from the correct source.\nI once saw about a \u0026ldquo;Notary\u0026rdquo; tool for containers but apparently - Notary V1 is no longer a recommended tool. There are tools such as cosign and notation but a quick search on Google and ChatGPT kind of seem to indicate that cosign might be a better tool for now.\nThis post kind of commands to try to the signing of the containers and move the containers between different registries.\nWe need to install 3 main tools here:\ndocker (container runtime to run docker) skopeo (mostly just copying container between registries) cosign (for signing containers) # Start 2 different registries docker run -d -p 5000:5000 registry docker run -d -p 5001:5000 registry # Pull nginx container and push to our local registry docker pull nginx:latest docker tag nginx:latest localhost:5000/nginx:latest docker push localhost:5000/nginx:latest # Sign the container on registry 1 (the one that is exposed via port 5000) cosign sign --key cosign.key --tlog-upload=false localhost:5000/nginx:latest # Copy the container from registry 1 to registry 2 skopeo copy --dest-tls-verify=false --src-tls-verify=false \\ docker://localhost:5000/nginx:latest \\ docker://localhost:5001/nginx:latest # Copy the container signature from registry 1 to registry 2 skopeo copy --dest-tls-verify=false --src-tls-verify=false \\ docker://localhost:5000/nginx:sha256-3651f5785567a226fd58e33adcfb27b41a83ba0c3649d9ee9ac590acd97bad51.sig \\ docker://localhost:5001/nginx:sha256-3651f5785567a226fd58e33adcfb27b41a83ba0c3649d9ee9ac590acd97bad51.sig # Verify the container signature of the pushed nginx container straight on registry 2 # Note that we didn\u0026#39;t sign the container on registry 2 cosign verify --key cosign.pub --insecure-ignore-tlog=true localhost:5001/nginx:latest Issue with Skopeo # I first tried to install the skopeo via plain old apt. However, i faced the following issue.\nFATA[0000] creating an updated image manifest: preparing updated manifest, layer \u0026#34;sha256:803acddaac35131e459cb398d6c900b136afec849b1dcb6e4d14c5a27569cdad\u0026#34;: unsupported MIME type for compression: application/vnd.dev.cosign.simplesigning.v1+json Apparently, main cause of this is due to version - apparently, a newer version of skopeo won\u0026rsquo;t face this issue. My WSL2 environment is VERSION=\u0026quot;22.04.2 LTS (Jammy Jellyfish)\u0026quot;. Only installs skopeo version 1.4.1\nApparently, it doesn\u0026rsquo;t seem to be possible to install the skopeo tool directly. The only way it seems to clone the repo and build the tool on my own machine - static building looks to troublesome, so we will try to rely on the binary that is dynamically linked to the packages on my machine\nsudo apt install libgpgme-dev libassuan-dev libbtrfs-dev pkg-config # Cannot use this one - seems like this causes the build to happen to a container # However, build it a container results in it linking to a newer glibc - mine is older # sudo make binary # Use the following make command instead sudo make bin/skopeo sudo mv bin/skopeo /usr/local/bin/ sudo chmod +x /usr/local/bin/skopeo ","date":"31 July 2025","externalUrl":null,"permalink":"/container-signing-experimentation/","section":"Posts","summary":"One of the major things that I was researching on for security stuff for distributing software is the capability to answer “is this software produced from your company”? This led me to a rabbit hole for the signing mechanism for containers. The signing mechanism is somewhat similar to us install packages from rpm or deb repos for the various linux repos - there is a need to ensure that the package received is truly from the correct source.\n","title":"Container Signing Experimentation","type":"posts"},{"content":"There is a technical challenge and interesting requirement in my job that requires lightweight snapshot capability of a folder/set of files. Technically, it should be ok to simply create a volume snapshot on the cloud vendor of this - however - creating such snapshots actually take a lot of time and potentially, a lot of space - it\u0026rsquo;s not the cheapest solution for this.\nHowever, I do see online that there are a bunch of different filesystems on the market. The common one that we generally use is ext3 or ext4 -\u0026gt; those are journalling file system. They are file systems that are focused on reliability on performance - reliability in the sense where if anything goes wrong, the data is not loss (and this is definitely a primary aim of a storage solution - to store stuff and ensure that the stuff doesn\u0026rsquo;t go missing). There is another file system though - the CoW (Copy on Write) file systems - I mostly started hearing of it due to Docker, there is a need to ensure that the layers on the container is lightweight. A primary feature for some of them is the capability to be able to create very quick lightweight snapshots of the file system and be able to review and rollback as and when necessary. We can think of it similar to doing incremental backups (to save resources) on file systems. For most companies, they do it via external systems on the usual file systems like the ext3 or ext4 but imagine if you use CoW systems - the snapshots are considered baked into the file system itself.\nLet\u0026rsquo;s go straight to demo\nSetting up the machine # We first need to setup the machine. I mostly use Rocky environments at work - so I\u0026rsquo;m gonna stick to them for this post.\nCreate a VM on Google Cloud Compute Change OS to Rocky 9 Choose e2-standard-4 (Reason for choosing bigger instance is due to installation of some of the packages that quite a bunch of resources) Add 2 extra volumes with 30GB disk (these will be used for ZFS) Setting up ZFS on Rocky 9 # sudo dnf install epel-release -y sudo dnf install -y https://zfsonlinux.org/epel/zfs-release-2-2.el9.noarch.rpm # Apparently, ZFS is supported on LTS kernel # This one would allow us to allow us to use on different kernel versions sudo dnf groupinstall \u0026#34;Development Tools\u0026#34; -y sudo dnf install kernel-devel -y sudo dnf install zfs -y # Reboot machine so that we can recognize the zfs module sudo reboot Creating the file system with ZFS # sudo fdisk -l # $ sudo fdisk -l # Disk /dev/sdc: 30 GiB, 32212254720 bytes, 62914560 sectors # Disk model: PersistentDisk # Units: sectors of 1 * 512 = 512 bytes # Sector size (logical/physical): 512 bytes / 4096 bytes # I/O size (minimum/optimal): 4096 bytes / 4096 bytes # Disk /dev/sdb: 30 GiB, 32212254720 bytes, 62914560 sectors # Disk model: PersistentDisk # Units: sectors of 1 * 512 = 512 bytes # Sector size (logical/physical): 512 bytes / 4096 bytes # I/O size (minimum/optimal): 4096 bytes / 4096 bytes sudo zpool create -m /usr/share/pool new-pool /dev/sdb /dev/sdc # $ df # Filesystem 1K-blocks Used Available Use% Mounted on # devtmpfs 4096 0 4096 0% /dev # tmpfs 8055276 0 8055276 0% /dev/shm # tmpfs 3222112 8716 3213396 1% /run # efivarfs 56 24 27 48% /sys/firmware/efi/efivars # /dev/sda2 20699136 5433124 15266012 27% / # /dev/sda1 204580 7208 197372 4% /boot/efi # tmpfs 1611052 4 1611048 1% /run/user/1000 # new-pool 59932288 128 59932160 1% /usr/share/pool sudo zpool status # $ sudo zpool status # pool: new-pool # state: ONLINE # config: # NAME STATE READ WRITE CKSUM # new-pool ONLINE 0 0 0 # sdb ONLINE 0 0 0 # sdc ONLINE 0 0 0 # errors: No known data errors # So that we can view read only snapshot folders sudo zfs set snapdir=visible new-pool At this stage - we can already start using the zfs file system\nUsing ZFS and using snapshotting capabilities # zfs list -t snapshot # no datasets available cd /usr/share/pool sudo mkdir data # Replace xxx with your user sudo chown xxx:xxx data cd data touch example.txt # Creating snapshot testing01 # This will have only example.txt in data folder sudo zfs snapshot new-pool@testing01 touch testing02.txt touch example02.txt sudo zfs snapshot new-pool@testing02 base64 /dev/urandom | head -c 100 \u0026gt; exammple03.txt base64 /dev/urandom | head -c 100 \u0026gt; testing03.txt sudo zfs snapshot new-pool@testing03 base64 /dev/urandom | head -c 100000 \u0026gt; example02.txt base64 /dev/urandom | head -c 100000 \u0026gt; testing02.txt sudo zfs snapshot new-pool@testing04 # View the read only snapshots of the file system cd /usr/share/pool/.zfs/snapshots # View difference of snapshots of file system $ sudo zfs diff new-pool@testing03 new-pool@testing04 M /usr/share/pool/data/testing02.txt M /usr/share/pool/data/example02.txt Some interesting things:\nWe can\u0026rsquo;t set a symbolic link to a snapshot - it\u0026rsquo;ll complain that the file system being linked to is a read only file system and that won\u0026rsquo;t for it. Cleanup # We can simply delete the instance once we are done. An important thing to note here is that we also would need to remove the additional disks being used to power the zfs file systems - these are not automatically removed.\nUntested # This is some commands I haven\u0026rsquo;t fully understood or tested it yet\n# To bind mount a snapshot # A previous attempt to mount the snapshot resulted in difficulties to unmount (due to busy device) # Also, forcing an unmount via umount -l also led to us not being able to access the dir on .zfs snapshot folder. It complains of too many symbolic links (probably due to bad commnads and procedures) mkdir /mnt/snap1 sudo mount --bind /tank/mydata/.zfs/snapshot/snap1 /mnt/snap1 ","date":"20 July 2025","externalUrl":null,"permalink":"/trying-zfs-filesystems/","section":"Posts","summary":"There is a technical challenge and interesting requirement in my job that requires lightweight snapshot capability of a folder/set of files. Technically, it should be ok to simply create a volume snapshot on the cloud vendor of this - however - creating such snapshots actually take a lot of time and potentially, a lot of space - it’s not the cheapest solution for this.\n","title":"Trying ZFS filesystems","type":"posts"},{"content":"","date":"1 July 2025","externalUrl":null,"permalink":"/categories/cicd/","section":"Article Categories","summary":"","title":"Cicd","type":"categories"},{"content":"","date":"1 July 2025","externalUrl":null,"permalink":"/tags/cicd/","section":"Technology Tags","summary":"","title":"Cicd","type":"tags"},{"content":"At my job, one recurring technical challenge has been syncing massive files—often ranging from 10GB to 20GB—across multiple servers. We\u0026rsquo;re essentially copying over large iso files around to various servers. The current process still somewhat works but it is bandwidth-intensive, and increasingly difficult to manage as our the number of servers we need to sync this large file grows. Traditional solutions like rsync or SCP work, but they don\u0026rsquo;t scale well when the same file needs to be pushed to dozens of machines.\nThis got me thinking: what if we treated file distribution more like content distribution? More specifically, what if we used torrents—a technology built precisely for efficiently sharing large files across many nodes? In this post, I’ll walk through the problem we face, why torrents are an intriguing alternative, and what it might take to implement such a solution.\nWhy torrent and not rsync or scp etc # There were a couple of requirements that we need in order to setup tests\nAble to sync large files Able to sync said files across multiple servers (easily could be more than a dozen) in a relatively short time Able to ensure that the large files are synced correctly. Ideally, it would be great if there was a mechanism that is able to check that the files are synced correctly. Any mis-sync would result in failures in the tests that are to be executed. Ensure that it does not overwhem the main server where the download originates from. We had a first naive implementation where we assume that bandwidth was \u0026ldquo;unlimited\u0026rdquo; for the server. This was done by having all the servers connect to the main origin server that produced the large file and transfer the file via scp/rsync. It literally melted at the server and result it to max out on its bandwidth. An initial attempt resulted in it taking 40 minutes or more - probably more time was spent by server to round robin across all the connections. No wonder it took such a long time.\nThe next approach naturally focused on reducing the need to transfer all this data at one time - what if all the servers had a queue? We could simply have the servers queue up one at a time and scp the file from the main origin server to the destination server. This definitely worked and it took roughly 2-2.5 minutes each time. This would mean that if there was 10 servers, it could take 20-25 minutes to transfer the file over to all the servers.\nHowever, we can definitely go faster (although not sure if there is a need to at the moment). What if we made it such that all the destination servers that need to download the file could all download parts of the file from each other? That\u0026rsquo;s technically the whole point of torrenting - its the capability to share files peer to peer. It has that capability to copy parts of files over from servers that already has those said portions.\nPlanning out the project # I wasn\u0026rsquo;t too familiar with torrenting technology so luckily, I saw a reference to code crafeters about building your own torrenting client. What we want here is a torrent that is not connected to the internet. The torrent tracker has to be internal - the torrent clients also cannot to the internet as well. Everything has to be internal.\nThis is the refence link: https://app.codecrafters.io/courses/bittorrent/overview\nSeems like the overall steps for this is as follows:\nUnderstanding the torrent file Apparently, this involves understanding the encoding method that torrent files used Understand how to get peerlist + initiate download of a piece of the file via torrent We will skip writing a blog post on this portion. Essentially, we just need to follow and try out the content on code crafters Build out 2 bittorrent clients that are able to communicate to each other. We can assume that 1 of the torrent client already has the file - the other torrent would simply need to run through all blocks to copy the file over. Build out a torrent tracker Have the bittorent client connect and report/get stats from it. Peer information should be able to be retrieved from it Build out a torrent manager of sorts. This tool doesn\u0026rsquo;t need to track torrents per say but users should of this tool should be able to go to it, upload a file and receive a torrent file that can then be passed the built torrent clients from before Try the tool for real for multiple clients connecting to one main client that holds the origin file. At this stage we need to ensure and check that load is truly shared between all clients. https://www.bittorrent.org/bittorrentecon.pdf How to ensure that one client is not overloaded - how to penalize or reward clients so that the workload can be spread around This is a pretty hefty list of items to go through - so, we definitely cannot cover in a single post. However, I will link all related post to this post to make this as a the main post that I can reference to when discussing about technical issues about torrents\nLinks # TODO - No links yet available. Will be added in the future\n","date":"1 July 2025","externalUrl":null,"permalink":"/solving-the-file-sync-bottleneck-in-tests-how-torrenting-could-be-the-answer-part-1/","section":"Posts","summary":"At my job, one recurring technical challenge has been syncing massive files—often ranging from 10GB to 20GB—across multiple servers. We’re essentially copying over large iso files around to various servers. The current process still somewhat works but it is bandwidth-intensive, and increasingly difficult to manage as our the number of servers we need to sync this large file grows. Traditional solutions like rsync or SCP work, but they don’t scale well when the same file needs to be pushed to dozens of machines.\n","title":"Solving the File Sync Bottleneck in tests: How Torrenting Could Be the Answer - Part 1","type":"posts"},{"content":"","date":"1 July 2025","externalUrl":null,"permalink":"/categories/torrent/","section":"Article Categories","summary":"","title":"Torrent","type":"categories"},{"content":"","date":"1 July 2025","externalUrl":null,"permalink":"/tags/torrent/","section":"Technology Tags","summary":"","title":"Torrent","type":"tags"},{"content":"Part of my job involves me dealing with Gitlab on a daily basis. Gitlab is a complicated beast to handle and it took a while to get around the various features that the product offers. One of the offerings available is one where we can set an entire Kubernetes cluster as a potential target where we can then create containers and run tests on said cluster.\nSome of the benefits of doing this is:\nInstead of having separate machines where workloads may not be efficiently used, we instead centralize it. Multiple teams can share the same resource and now, there is only a single machine/setup that the Devops team can look at and manage Some of the cons:\nComplicated setup. It involves setting up a Kubernetes Cluster and managing it (including doing software updates etc). However, a potential point here would be that we don\u0026rsquo;t really need to \u0026ldquo;maintain\u0026rdquo; the cluster 100% of the time. Essentially, we can explore blue-green way of deployment where we can setup a new updated cluster and bring down the old one accordingly (require research for this) This blog post would cover on setting up a gitlab server and a Kubernetes cluster runner setup - we would then see how it behaves with this. Do note that the cluster here is extremely not secure - we skip steps of doing domain registration etc or even adding ssl certs - its always good to follow such best practises wherever possible.\nInstall Gitlab Server on Google Compute Instance # We will follow the steps mentioned here. Do note that Gitlab is a pretty resource hungry product - we would need to deploy a pretty powerful machine here (the below experiment worked decently with 4 CPUs 16 GB ram)\nReference Link:\nhttps://about.gitlab.com/install/#debian\nRun the following command to install Gitlab Community Edition\nsudo apt-get update sudo apt-get install -y curl openssh-server ca-certificates perl sudo apt-get install -y postfix curl https://packages.gitlab.com/install/repositories/gitlab/gitlab-ce/script.deb.sh | sudo bash sudo EXTERNAL_URL=\u0026#34;http://\u0026lt;IP Address\u0026gt;\u0026#34; apt-get install gitlab-ce cat /etc/gitlab/initial_root_password Add a whole bunch of configuration to disable in order to allow for \u0026ldquo;testing version\u0026rdquo; of gitlab\nhttps://docs.gitlab.com/omnibus/settings/nginx.html\nSome of the configurations to rollback in order to be able test locally. We will edit the file: /etc/gitlab/gitlab.rb in order to do have a less secure deployment for easier testing.\nnginx[\u0026#39;enable\u0026#39;] = true nginx[\u0026#39;client_max_body_size\u0026#39;] = \u0026#39;250m\u0026#39; nginx[\u0026#39;redirect_http_to_https\u0026#39;] = false nginx[\u0026#39;listen_addresses\u0026#39;] = [\u0026#39;*\u0026#39;, \u0026#39;[::]\u0026#39;] external_url = \u0026#39;xxx\u0026#39; Do note that for external url - that\u0026rsquo;ll be the public ip address of the instance.\nOnce we have configure the gitlab.rb file, we can then run the following command to have it reconfigure the various files in the server.\nsudo gitlab-ctl reconfigure Connecting a kubernetes cluster # First step would be to create a Google Kubernetes Engine Cluster.\nNext, we would then install helm on said GKE\nhttps://helm.sh/docs/intro/install/\n# Install helm locally curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash # Adds gitlab helm repo helm repo add gitlab https://charts.gitlab.io Create the values.yaml file. We need to replace the images - seems like we can\u0026rsquo;t reach/access them easily. We also need the runner token: https://docs.gitlab.com/ci/runners/runners_scope/#create-an-instance-runner-with-a-runner-authentication-token\nimage: registry: \u0026#39;\u0026#39; image: gitlab/gitlab-runner gitlabUrl: http://\u0026lt;IP Address\u0026gt; runnerToken: glrt-t1_xxxxxxx rbac: create: true serviceAccount: create: true runners: # https://docs.gitlab.com/runner/configuration/advanced-configuration.html config: | [[runners]] [runners.kubernetes] helper_image = \u0026#34;gitlab/gitlab-runner-helper:alpine3.21-x86_64-ef636327\u0026#34; namespace = \u0026#34;{{.Release.Namespace}}\u0026#34; image = \u0026#34;alpine\u0026#34; For experimentation purposes, we would add it to a new different namespace zzz. To create a new namespace in kubernetes, make sure we access the kubectl command and then execute the command:\nkubectl create namespace zzz Once we have the values.yaml file and the namespace, we can run the following command to install the helm chart\nhelm upgrade --install --namespace zzz gitlab-runner -f values.yaml gitlab/gitlab-runner To test our setup, we can then create a simple empty repo and then add the following to the .gitlab-ci.yml file. Once the code is pushed, it should immediately trigger a run\nbuild-job: image: nginx:latest stage: build script: - echo \u0026#34;Hello, xxx\u0026#34; tags: - containers Troubleshooting # One important thing to note is the pod can take up to 90s to be \u0026ldquo;ready\u0026rdquo;. However, it registers with the gitlab pretty quickly\nHere are some of the issues I faced while doing the above setup\nPreparing environment 00:00 ERROR: Error cleaning up secrets: resource name may not be empty ERROR: Job failed (system failure): prepare environment: setting up credentials: secrets is forbidden: User \u0026#34;system:serviceaccount:zzz:default\u0026#34; cannot create resource \u0026#34;secrets\u0026#34; in API group \u0026#34;\u0026#34; in the namespace \u0026#34;zzz\u0026#34;. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information This issue is because the Gitlab runner helm installation did not automatically include rbac rules. But most kubernetes clusters now enable RBAC rules by default (secure by default). It\u0026rsquo;s too much work to actually disable - so its easier to allow the helm chart to create it (and they took the shortcut way to do so by enable \u0026lsquo;*\u0026rsquo; access for all the critical Kubernetes apis such as pod creation/secret creation etc)\nWARNING: Event retrieved from the cluster: Failed to pull image \u0026#34;registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-v17.10.1\u0026#34;: failed to pull and unpack image \u0026#34;registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-v17.10.1\u0026#34;: failed to copy: httpReadSeeker: failed open: failed to do request: This issue came up because apparently, the default helper image is not reachable/accessible. We need to overwrite it - see the values.yaml under the config key\nWARNING: Event retrieved from the cluster: Unable to retrieve some image pull secrets (runner-t1sflzcw-project-1-concurrent-0-ixn6wo8t); attempting to pull the image may not succeed. WARNING: Event retrieved from the cluster: Failed to pull image \u0026#34;gitlab/gitlab-runner-helper:latest\u0026#34;: rpc error: code = NotFound desc = failed to pull and unpack image \u0026#34;docker.io/gitlab/gitlab-runner-helper:latest\u0026#34;: failed to resolve reference \u0026#34;docker.io/gitlab/gitlab-runner-helper:latest\u0026#34;: docker.io/gitlab/gitlab-runner-helper:latest: not found WARNING: Event retrieved from the cluster: Error: ErrImagePull WARNING: Event retrieved from the cluster: Error: ImagePullBackOff Similar to above but just a FYI that the latest tag doesn\u0026rsquo;t exist for the gitlab-runner-helper image. We need to specify an exact image in order to have it work.\n","date":"8 April 2025","externalUrl":null,"permalink":"/gke-as-gitlab-runner/","section":"Posts","summary":"Part of my job involves me dealing with Gitlab on a daily basis. Gitlab is a complicated beast to handle and it took a while to get around the various features that the product offers. One of the offerings available is one where we can set an entire Kubernetes cluster as a potential target where we can then create containers and run tests on said cluster.\n","title":"GKE as Gitlab Runner","type":"posts"},{"content":"","date":"8 April 2025","externalUrl":null,"permalink":"/categories/kubernetes/","section":"Article Categories","summary":"","title":"Kubernetes","type":"categories"},{"content":"","date":"8 April 2025","externalUrl":null,"permalink":"/tags/kubernetes/","section":"Technology Tags","summary":"","title":"Kubernetes","type":"tags"},{"content":"When we initially start playing around with compute instances in the cloud, we generally just deploy instances without thinking too much about it. We don\u0026rsquo;t think about the application requirements or how CPU or Memory may require. But with experience, we then know the importance of providing sufficient resources to the applications that we install on the server - and a pretty huge one to think about the amount of storage we allocate to the server for our application.\nOn first glance, we may simply just create a instance with a pretty large size to it. However, the initial way of creating such instances on most cloud vendors and even in google cloud is that the main volume attached being attached to the instance is the \u0026ldquo;root\u0026rdquo; partition. This root partition generally tends to contain the OS files and kernel etc. It is not a problem to simply give a bigger disk storage to the root partition but it is actually not necessary per say. Let\u0026rsquo;s go through a couple of examples where simply increasing the root partition may not be the most ideal\nReasons for having multiple partitions:\nBeing able to precisely backup the database files and storage. It is quite easy to go to console and click on \u0026ldquo;Clone disk\u0026rdquo; option. However, if we are to mix both OS level files as well as database files - we are technically cloning all of such files - it is way harder to separate them. By being able to have just the databases file in a single volume - we can clone such disk - mount it to a different system and do be able to experiments such as upgrading database server versions or even upgrading linux OS/kernel Compliance reasons - can potentially have various diffent reasons for why different partitions exist: https://techgirlkb.guru/2019/08/how-to-create-cis-compliant-partitions-on-aws/ We won\u0026rsquo;t go too deep on why we would want multiple partitions - however, we can look into how to do so.\nSetting up first instance # We can set up an instance with 2 disk attached to it. One of them would be the usual root disk (which would have our Linux OS - which would typically be Debian as the default). Second disk will be used to store data for a database (MariaDB) - which typically holds data in /var/lib/mysql\nWhen we start the instance we can run the command:\nlsblk # NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS # sda 8:0 0 100G 0 disk # sdb 8:16 0 10G 0 disk # ├─sdb1 8:17 0 9.9G 0 part / # ├─sdb14 8:30 0 3M 0 part # └─sdb15 8:31 0 124M 0 part /boot/efi We would first need to format and partition the sda device.\nsudo apt update sudo apt install -y fdisk sudo fdisk /dev/sda # Press n for new partition # Leave 1 as default for configuring first partition # Press enter again to set the default first sector # Press enter again to set the default last sector # Press p to see the current configurations # Press w to write configurations into stone for the device sudo mkfs -t ext4 /dev/sda1 The disk is formatted now. We can then mount it and see how it goes\nsudo mkdir -p /var/lib/mysql sudo mount -t auto /dev/sda1 /var/lib/mysql Next step would be to install MariaDB\nsudo apt install -y mariadb-server At this stage, if we list out the files in /var/lib/mysql folder; we would be able to see some files already there. Next step would be try to populate it so that we can do the next experiment of sorts\nPopulating MariaDB server # We can do so by running the following:\n# Become root user sudo su - # Go into mysql console mysql # Create database CREATE DATABASE hehe; # Create table USE hehe; CREATE TABLE Persons ( PersonID int, LastName varchar(255), FirstName varchar(255), Address varchar(255), City varchar(255) ); # Inserting some data INSERT INTO Persons Values (0, \u0026#39;aa\u0026#39;, \u0026#39;aa\u0026#39;, \u0026#39;aa\u0026#39;, \u0026#39;aca\u0026#39;); INSERT INTO Persons Values (1, \u0026#39;aa\u0026#39;, \u0026#39;aa\u0026#39;, \u0026#39;aa\u0026#39;, \u0026#39;aca\u0026#39;); INSERT INTO Persons Values (2, \u0026#39;aa\u0026#39;, \u0026#39;aa\u0026#39;, \u0026#39;aa\u0026#39;, \u0026#39;aca\u0026#39;); # Viewing data select * from Persons; # +----------+----------+-----------+---------+------+ # | PersonID | LastName | FirstName | Address | City | # +----------+----------+-----------+---------+------+ # | 0 | aa | aa | aa | aca | # | 1 | aa | aa | aa | aca | # | 2 | aa | aa | aa | aca | # +----------+----------+-----------+---------+------+ # 3 rows in set (0.001 sec) Cloning disk and mounting on different server # Once we have done some simple population of data, we can go back to Google Cloud Console and clone the disk. We can then attach this cloned disk to a diffent Google Cloud Instance.\nWe would need to run the following on the new instance.\n# Update sudo apt update # Mkdir sudo mkdir -p /var/lib/mysql # Need to check it via some commands. e.g. lsblk # Get the right device ID (/dev/xxxx) sudo mount -t auto /dev/sda1 /var/lib/mysql # At this point, you will have already seen a bunch of files already here related to db # Install mariadb-server sudo apt install -y mariadb-server Now that we run the command\nmysql # Viewing data select * from Persons; # +----------+----------+-----------+---------+------+ # | PersonID | LastName | FirstName | Address | City | # +----------+----------+-----------+---------+------+ # | 0 | aa | aa | aa | aca | # | 1 | aa | aa | aa | aca | # | 2 | aa | aa | aa | aca | # +----------+----------+-----------+---------+------+ # 3 rows in set (0.001 sec) We should be able to see same set of data\nIncreasing size of disk # We can also simply increase the size of the disk but for such non-root partitions, it might take a bit of effort. First go to Google Cloud Console and increase the size of disk\nAn example of lsblk when we increase the disk size before resizing it within the instance\nlsblk # NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS # sda 8:0 0 180G 0 disk # └─sda1 8:1 0 150G 0 part /var/lib/mysql # sdb 8:16 0 10G 0 disk # ├─sdb1 8:17 0 9.9G 0 part / # ├─sdb14 8:30 0 3M 0 part # └─sdb15 8:31 0 124M 0 part /boot/efi Notice the difference in size between sda and sda1\nNext set of commands are to try to resize it (with heavy reference from Google Cloud Documenation)\nsudo parted /dev/sda # Goes to parted \u0026#34;window\u0026#34; # Command on what to do next resizepart # Which parition to increase 1 # Whether to do it on live partition Yes # What\u0026#39;s the last partition to extend to 100% # Quit quit # Might require to run this part sudo partprobe /dev/sda We need to resize it on file system as well\nsudo resize2fs /dev/sda1 Conclusion # With that, we have somewhat played around with trying to make it easier to do snapshots and then from there do convenient disk cloning in order to be able to test out systems. One possible use case is where we can clone the data from a production system and then be able to run query or maybe make it easier to provide a staging environment for us to run integration test (there is no system better than testing it on production data after all)\nReferences # https://www.zdnet.com/article/how-to-format-a-drive-on-linux-from-the-command-line/\nhttps://cloud.google.com/compute/docs/disks/resize-persistent-disk\n","date":"1 April 2025","externalUrl":null,"permalink":"/configuring-compute-storage/","section":"Posts","summary":"When we initially start playing around with compute instances in the cloud, we generally just deploy instances without thinking too much about it. We don’t think about the application requirements or how CPU or Memory may require. But with experience, we then know the importance of providing sufficient resources to the applications that we install on the server - and a pretty huge one to think about the amount of storage we allocate to the server for our application.\n","title":"Configuring Compute Storage","type":"posts"},{"content":" Container based security measures Smaller images for code execution platform Not running the container as root Kubernetes related Run the deployment in different namespace Setting up a new Service account in kubernetes Ensuring service account token is not mounted in potentially vulnerable pods Ensuring that the container is started with non-root access Ensuring resource limits are set Set security context Setting network policy Using a stricter seccomp/apparmor profile Tool related Ensure limited logs sniffed Ensure that there is a time limit of code executions Future efforts I had previously attempted to build a code assessment tool in docker. That involves doing the following:\nBuild a web application which a user can interact with Have a separate worker that would start container runs that would run the encapsulated code Capture all of those data into some sort of database The codebase for this can be found here: https://github.com/hairizuanbinnoorazman/Python_programming/tree/master/docker_code_executor\nHowever, the above simple solution only works on a single node - if we were to go into a situation where we would be running hundreds/thousands of code runs at one time, then, we might not be able handle it on a single node - we would need to scale out.\nJust a note here: Building a code assessment tool involves a lot more simply the code execution platform. There is also part about providing the \u0026ldquo;unit\u0026rdquo; test portion where test cases would be tested against code provided by the user of the platform. There is also the rewards system etc. However, these sections are \u0026ldquo;easier\u0026rdquo; or less interesting to talk about as compared to the code execution portion - this is where it would interesting to someone who delves in code/infrastructure - how to ensure we can adopt the best secure posture when taking in potential malicious attacks on the codebase.\nWith regards to the implementation - this is the rough implementation that I have in mind (a thougher approach might be better as to whatever I have in mind)\nCreate a piece of controller/web applicaiton code that is able to manipulate kubernetes resources. This controller code would create jobs/pods that would then inject the third party code in and run it and store it. User inputted code will be loaded in via configmaps (it can store up to 1mb) - we should have a limit to what can be passed to the execution engine Logs are temporary stored in pods - this is fetched into the web application The focus of this post is more of the security measures that we have done to try to harden the code execution portion in order to limit the blast radius of potential issues from user submitted code.\nRefer to the implementation here: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Apps/code-executor-k8s\nContainer based security measures # Smaller images for code execution platform # Run the application on smaller docker images where there are less utilities available. For Python - we can have the \u0026ldquo;normal\u0026rdquo; python. However, there are slim editions as well as alpine. Over here - there is also a choice for distroless but this would take a bit of research to get it working.\nNot running the container as root # We need to ensure that the user on the container is not a root user. With that, we should be able to limit the potential things that a user would be able to within the container since most thigs that would alter file states on a server would require root permissions.\nKubernetes related # Run the deployment in different namespace # There is not much isolation when it comes to workloads and connectivity between pods in a Kubernetes cluster. We can\u0026rsquo;t use namespace to properly isolate the various pod executions - it is technically still possible for pods to converse between namespaces. Sometimes, this is a feature that people use when they might put all monitoring related pods in a namespace and the rest in a \u0026ldquo;application\u0026rdquo; namespace. One benefit for this slight segregation is the ease to delete resources.\nIf we know that the application resource is \u0026ldquo;hacked\u0026rdquo; and causing security issues - we can potentially go in and delete the entire namespace - all the resources tied to that namespace should be delete along side it.\nSetting up a new Service account in kubernetes # Kubernetes would create pods with a service account - if not specified, it\u0026rsquo;ll go with default. We might specific things that we might add to service accounts which may not apply too well with what we\u0026rsquo;re trying to build here - so it\u0026rsquo;ll be better to start from a clean slate by creating a new service account. There could be a potential where default service account is already having some special kubernetes permissions - if this is so, it immediately raises the risk for the application.\nFor this, we would first a service account which would get certain access for our controller/web application - e.g. viewing pod logs, being able to create Kubernetes jobs/pods etc, being able to list/delete jobs and pods.\nWe would also need to create another service account token that will have 0 permissions to access any of the Kubernetes APIs. This would be the service account that would be used for our pods that would run the the third party submitted code.\nEnsuring service account token is not mounted in potentially vulnerable pods # Technically, the submitted code that would be running on these potentially vulnerable pods would need 0 access to kubernetes access. Also, even if one argues that it is needed - that would definitely raise the risk for running such third party code.\nBy ensuring that the token is not mounted, that would reduce the risk that the third party could take over and cause damage by contacting the kubernetes api.\nEnsuring that the container is started with non-root access # We can ensure this behaviour by adding a flag for this in the kubernetes manifest file. If the docker container did not set a non-root user, it will result in issues - the container will not run and in the description - it will complain of \u0026ldquo;Container Configuration Issue\u0026rdquo; of sorts - the container/pod will not be able to start running\nEnsuring resource limits are set # If limits are not set, the pods can technically expand its usage to take over the entire cluster assuming that the priority pods already take their share of resources. This is potentially bad - let\u0026rsquo;s say if 1 single pod can take up resources of a entire node. If we have 5 nodes, just 5 pods (which could just be 5 code submissions) - could be cause our entire software to run to a complete stop (for a couple of seconds/minutes) - depending on the time limit of the kubernetes job.\nSet security context # Security context is pretty important field to configure to ensure that we adopt a proper security posture for our apps. Here, we can alter various settings such as:\n... SecurityContext: \u0026amp;core.PodSecurityContext{ SELinuxOptions: \u0026amp;core.SELinuxOptions{}, RunAsNonRoot: boolPtr(true), RunAsUser: int64Ptr(3000), RunAsGroup: int64Ptr(3000), SeccompProfile: \u0026amp;core.SeccompProfile{ Type: core.SeccompProfileTypeRuntimeDefault, }, AppArmorProfile: \u0026amp;core.AppArmorProfile{ Type: core.AppArmorProfileTypeRuntimeDefault, }, }, ... Over here, we can ensure uid and gids of the user running in our container - vital that we are running at id-s more than 1000; Uid and Gid less than 1000 are usually known as priviliged IDs. Over here, we can set some sort of Seccomp and Apparmor profile - these are common linux configurations that would reduce access to certain resources and system calls for the pods.\nThese configuraitions are on the pod level - we have more securitycontexts that we can set on the container level (within the pods)\n... SecurityContext: \u0026amp;core.SecurityContext{ Capabilities: \u0026amp;core.Capabilities{ Drop: []core.Capability{\u0026#34;all\u0026#34;}, }, Privileged: boolPtr(false), ReadOnlyRootFilesystem: boolPtr(true), AllowPrivilegeEscalation: boolPtr(false), }, ... Over here, we can even ensure that we drop linux capabilities, set read only filesystems and ensure that a normal user wouldn\u0026rsquo;t be able to do \u0026ldquo;sudo\u0026rdquo; to run priviliged commands within the container.\nSetting network policy # What we\u0026rsquo;re trying to do here is a \u0026ldquo;code assessment\u0026rdquo; tool - this would mean that there is very little reason to try to create environment that allows for internet access. That would mean that it would make sense to ensure that the pod has 0 ingress and egress capabilities.\nOne reason for trying to limit this is to ensure that submitter would not be able to run scripts that would call out to some external endpoint that can pull in a malicious binary. If we block internet access both ways, we can ensure that this form of attack is somewhat blocked.\nReference: https://kubernetes.io/docs/concepts/services-networking/network-policies/\nUsing a stricter seccomp/apparmor profile # Right now, there isn\u0026rsquo;t a convenient way to distribute apparmor or seccomp profiles across the various nodes in the cluster. If we were to do it without any tool, we would need to go to every ndoe and add the profile in a specific folder for every single node (technically, this is possible with provision/infrastruture templating tools).\nHowever, this is not covered in the above implementation - this should probably be covered in its own blog post to cover it in greater detail of what this controller is doing etc.\nReference: https://github.com/kubernetes-sigs/security-profiles-operator\nTool related # Ensure limited logs sniffed # We definitely need to limit the amount of logs that will be collected by the web application. While writing this controller/applicationinitially - i did not the log limit. I then write a small program that does a loop in a loop in a loop:\nfor a in range(1, 1000): for b in range(1, 1000): for c in range(1, 1000): print(f\u0026#34;{a} - {b} - {c}\u0026#34;) This code immediately can cause issues:\nThe code takes a long time to run (but we can control the time limit of the code executions) Each iteration writes a line - however, in this case, we would be writing 1000,000,000 lines of logs. Where are these logs going to be stored and how will it be shown to the user? While showing the logs to the website, it immediately crashed the chrome tab. - definitely an issue\nEnsure that there is a time limit of code executions # We can\u0026rsquo;t wait for code executions to complete or fail - a time limit needs to be set. It is definitely possible for a third party submitter to just submit some code that runs forever - and this would simply mean that the pods would be created and be left to exist for a very very long time in the cluster. Even if try to allow this as well as ensure that each pod only takes a small amount of resource; it would eventually fill up the Kubernetes cluster and cause further problems down the line.\nFuture efforts # The implementation mentioned here: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Apps/code-executor-k8s is a initial implementation (as of November 2024) - and is not the hardened approach to this problem. It is more of the minimal easy to do security measures that can be done to try to achieve this. However, there is more that can be done:\nRun the following scan to ensure compliance: https://github.com/kubescape/kubescape Run micro-vms (e.g. kata containers/gvisor). However, this would mean that we need to get nodes that support provisioning of vms. In cloud providers, we would need to get virtual machines that support \u0026ldquo;vm in vm\u0026rdquo; situation. However, doing this approach would definitely lead to a massive slowdown approach of executing code. Run sqlmap scans? This is to ensure that the application accepting third party submissions Rate limiting (Current version has no concept of rate limiting) User authentication (Current version is just a prototype and doesn\u0026rsquo;t do auth/authorizations) ","date":"10 November 2024","externalUrl":null,"permalink":"/building-a-code-assessment-tool-but-in-kubernetes/","section":"Posts","summary":" Container based security measures Smaller images for code execution platform Not running the container as root Kubernetes related Run the deployment in different namespace Setting up a new Service account in kubernetes Ensuring service account token is not mounted in potentially vulnerable pods Ensuring that the container is started with non-root access Ensuring resource limits are set Set security context Setting network policy Using a stricter seccomp/apparmor profile Tool related Ensure limited logs sniffed Ensure that there is a time limit of code executions Future efforts I had previously attempted to build a code assessment tool in docker. That involves doing the following:\n","title":"Building a code assessment tool but in Kubernetes","type":"posts"},{"content":"This is a continuation of the previous blog post for automating Jenkins server setup. The previous setup only created a setup for a single node Jenkins build server farm. This definitely won\u0026rsquo;t be sufficient for larger teams where they would be building applications and running workflows on a more frequent basis. Refer to the page: Automating Jenkins Initial Setup\nThe next step to try to automate would be the automating of adding worker or agent nodes to the entire cluster. Before going down that route, let\u0026rsquo;s first try to add it in a manual fashion that is extended from our previous step.\nLet\u0026rsquo;s first aim to setup the agent on the same machine but have the main/controller and the other worker nodes to be separate docker containers.\nManually connect Jenkins agent to main Jenkins node # First, we\u0026rsquo;ll need to setup a new docker network - this is to allow the containers to talk to each other.\ndocker create network cicd Next step would be create the Jenkins main node\ndocker build -t cjenkins . docker run --name jenkins -p 8090:8080 --network cicd -d cjenkins The Dockerfile and the way we would build it is all mentioned in previous post. This post is focusing on how we can connect the agent to the main/controller Jenkins server.\nOnce we have our Jenkins main controller running, the next step would be to be to set up the steps to manually connect our Jenkins agent. The first step is to click manage Jenkins\nThe next step would be to click the manage nodes\nWe can set then create a node that our main Jenkins main node will be managing.\nName: zzz Number of executors: 1 Remote root directory: /home/jenkins Labels: local Usage: Use this node as much as possible Launch Method: Launch agent by connecting it to the controller Availability: Keep this agent online as much as possible Once we have configured it, we can then run the following docker Jenkins agent to have that connect to the Jenkins main node.\ndocker run --name agent -d --network cicd jenkins/agent java -jar /usr/share/jenkins/agent.jar -url http://jenkins:8080/ -secret \u0026lt;new secret always generated\u0026gt; -name zzz -workDir \u0026#34;/home/jenkins\u0026#34; With that, the Jenkins node should be available for use. However, the above setup is only via manual means - we can definitely do better.\nDocker-compose setup of Jenkins # TLDR - the setup will be maintained here: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Environment/jenkins. However, there is a chance that it might look different as compared to what we have on the blog, it will updated to keep up with thet times or maybe, there might be new features introduced to it.\nThis setup is a more \u0026ldquo;automated\u0026rdquo; setup of Jenkins main controller node as well as a worker node. This would probably be a \u0026ldquo;better\u0026rdquo; way to setup some Jenkins clusters since with such automation in place, we would need to specify almost everything that our Jenkins node need - e.g. secrets/keys that we would be using in order for the nodes would connect to each other.\nThe first part for this is to alter our controller Jenkins node\u0026rsquo;s dockerfile.\nFROM jenkins/jenkins:latest COPY plugins.txt /var/jenkins_home/plugins.txt RUN /bin/jenkins-plugin-cli -f /var/jenkins_home/plugins.txt COPY jenkins.yaml /var/jenkins_home/jenkins.yaml ENV JAVA_OPTS \u0026#34;-Djenkins.install.runSetupWizard=false ${JAVA_OPTS:-}\u0026#34; ENV CASC_JENKINS_CONFIG=/var/jenkins_home/jenkins.yaml ENV SSH_PRIVATE_FILE_PATH=/home/jenkins/.ssh/ultimate_ssh_key RUN git config --global user.email \u0026#34;you@example.com\u0026#34; \u0026amp;\u0026amp; \\ git config --global user.name \u0026#34;Your Name\u0026#34; COPY jobs /home/jobs COPY pipelines /home/pipelines USER root RUN mkdir -p /home/jenkins/.ssh \u0026amp;\u0026amp; chown jenkins:jenkins /home/jenkins/.ssh USER jenkins The first few lines are probably something that you\u0026rsquo;ve would have seen in previous blog posts on the jenkins topic. However, there are several new lines of code that might be of interest:\nENV SSH_PRIVATE_FILE_PATH=/home/jenkins/.ssh/ultimate_ssh_key ... USER root RUN mkdir -p /home/jenkins/.ssh \u0026amp;\u0026amp; chown jenkins:jenkins /home/jenkins/.ssh USER jenkins These set of lines are partly to setup up the main Jenkins controller node to be able to utilize ssh keys in order to communicate with other Jenkins node. It\u0026rsquo;s definitely a pain to connect Jenkins node together in the manual fashion from the above portion of this blog post. SSH keys seem to be more saner (and possibly safer option here)\nImportant thing to note here is to create the .ssh directory at /home/jenkins and to ensure that we set the owner of that folder to jenkins. This is to ensure that our the user that\u0026rsquo;ll be in controll of our Docker container would be able to access ssh files.\nRUN git config --global user.email \u0026#34;you@example.com\u0026#34; \u0026amp;\u0026amp; \\ git config --global user.name \u0026#34;Your Name\u0026#34; COPY jobs /home/jobs COPY pipelines /home/pipelines The lines mentioned here are mostly focused on us being able to set up pipeline jobs on the Jenkins and have it available immediately. One of the Jenkins job would require us to do some git operations to read pipeline Jenkinsfile code into Jenkins - that needs the git tool. However, the git tool is somewhat unusable unless we set the initial configuration such as setting global user.email and user.name.\nThe jobs mentioned here are mostly here to assist in creating in the Jenkins pipelines. Jenkins pipelines configurations are not immediately available on Jenkins Configuration as Code - however, the there is a Jobs DSL which we can use to define simple Jenkins job that would help us to define Jenkins pipeline jobs.\nThat\u0026rsquo;s for the our main Jenkins controller node Dockerfile\nThe next portion would be our Jenkin agent\u0026rsquo;s Dockerfile\nFROM jenkins/agent USER root RUN mkdir -p /home/jenkins/.ssh \u0026amp;\u0026amp; chown jenkins:jenkins /home/jenkins/.ssh RUN apt update \u0026amp;\u0026amp; apt install -y openssh-server RUN ssh-keygen -A \u0026amp;\u0026amp; service ssh --full-restart CMD [\u0026#34;/usr/sbin/sshd\u0026#34;, \u0026#34;-D\u0026#34;] Due to sshd being a \u0026ldquo;root\u0026rdquo; level binary - we have no choice but to be root - probably need to figure how we can try avoid that, but that\u0026rsquo;ll be a problem for another day.\nAlso, similar to Jenkins controller\u0026rsquo;s Dockerfile, we would also create the .ssh folder and set the owner to Jenkins.\nThe final bit to get it all working together in a single command would be to write up docker-compose.yaml. With the above Dockerfiles, it should hopefully work with the following docker-compose.yaml definition.\nversion: \u0026#39;3.3\u0026#39; services: jenkins: build: context: . dockerfile: Dockerfile ports: - 8090:8080 restart: always volumes: - type: bind source: ./secrets/private target: /home/jenkins/.ssh/ read_only: true agent: build: context: . dockerfile: agent.Dockerfile restart: always volumes: - type: bind source: ./secrets/public target: /home/jenkins/.ssh/ read_only: true In order to ensure that we\u0026rsquo;re not baking our ssh keys in the docker image, we need to ensure that it is mounted instead rather than adding it in a Dockerfile - during my first attempt at this, I added it within a Dockerfile but only realized quite a while later that that\u0026rsquo;ll be a very very bad move (in the case someone managed to get the access to the internal terminal app of the docker container.)\nDo make sure that we have the folders available for use within the same folder that contains our main Jenkins controller\u0026rsquo;s Dockerfile (which is named Dockerfile) as well as our agent\u0026rsquo;s Dockerfile (which is named agent.Dockerfile). Our ssh keys should be in the /secrets folder with one public folder and one private folder. The public folder would only contain the authorized_keys file that would serve as the file that our main Jenkinsfile would authorize against with the private ssh key file. Our private folder would simply have the ultimate_ssh_key ssh key.\nOne more thing of note that is changed as compared to previous blog post would be our Jenkins configuration as code yaml file. Naturally, there will be a slight focus on how to connect our main Jenkins main controller node to other nodes.\njenkins: systemMessage: Jenkins managed via Configuration as Code securityRealm: local: allowsSignup: false users: - id: admin password: password authorizationStrategy: roleBased: roles: global: - name: \u0026#34;admin\u0026#34; description: \u0026#34;Jenkins administrators\u0026#34; permissions: - \u0026#34;Overall/Administer\u0026#34; entries: - user: \u0026#34;admin\u0026#34; - name: \u0026#34;readonly\u0026#34; description: \u0026#34;Read-only users\u0026#34; permissions: - \u0026#34;Overall/Read\u0026#34; - \u0026#34;Job/Read\u0026#34; entries: - user: \u0026#34;authenticated\u0026#34; crumbIssuer: \u0026#34;standard\u0026#34; numExecutors: 0 nodes: - permanent: labelString: \u0026#34;linux\u0026#34; mode: NORMAL name: \u0026#34;zzz\u0026#34; numExecutors: 4 remoteFS: \u0026#34;/home/jenkins\u0026#34; launcher: ssh: host: \u0026#34;agent\u0026#34; port: 22 javaPath: \u0026#34;/opt/java/openjdk/bin/java\u0026#34; credentialsId: ultimate_ssh_key launchTimeoutSeconds: 60 maxNumRetries: 3 retryWaitTime: 30 sshHostKeyVerificationStrategy: manuallyTrustedKeyVerificationStrategy: requireInitialManualTrust: false credentials: system: domainCredentials: - credentials: - usernamePassword: scope: SYSTEM id: admin username: admin password: password - basicSSHUserPrivateKey: scope: SYSTEM id: ultimate_ssh_key username: jenkins description: \u0026#34;SSH private key file. Provided via file\u0026#34; privateKeySource: directEntry: privateKey: \u0026#34;${readFile:${SSH_PRIVATE_FILE_PATH}}\u0026#34; jobs: - file: /home/jobs/firstjob.groovy - file: /home/jobs/secondjob.groovy unclassified: # scmGit: # addGitTagAction: false # allowSecondFetch: false # createAccountBasedOnEmail: true # disableGitToolChooser: false # globalConfigEmail: jenkins@domain.local # globalConfigName: jenkins # hideCredentials: true # showEntireCommitSummaryInChanges: true # useExistingAccountWithSameEmail: false location: url: http://localhost:8090 adminAddress: admin@jenkins.com The new parts would be the credentials section where a new credential was added - a ssh key credential. For our permanent nodes, we would connect it via our ssh keys - this is done via ssh launcher configuration.\nAnother impact of the change in configuration is the need to install the following plugin as well. This plugin is to allow the Jenkins controller node to connect to agent nodes: ssh-slaves\nAfterthoughts # So far, our setup is only done on a single machine. For future setups, I will probably look into expanding to multi node setups or even one where we have Jenkins that connect to a Kubernetes cluster - one where it can utilize the entire cluster as its build worker (assumption based on Kubernetes plugin seen in Jenkins list of plugins page)\n","date":"10 January 2024","externalUrl":null,"permalink":"/connect-slaves-jenkins-configured-with-jcasc-docker/","section":"Posts","summary":"This is a continuation of the previous blog post for automating Jenkins server setup. The previous setup only created a setup for a single node Jenkins build server farm. This definitely won’t be sufficient for larger teams where they would be building applications and running workflows on a more frequent basis. Refer to the page: Automating Jenkins Initial Setup\n","title":"Connect Slaves Jenkins configured with JCasC - Docker","type":"posts"},{"content":"Jenkins, a pretty popular Continuous Integration/Continuous Deployment (CI/CD) build tool, plays a pivotal role in automating the software development/deployment process. Over the years, Jenkins has evolved to become an extremely versatile automation server that facilitates continuous integration and delivery by orchestrating the building, testing, and deployment of code. Its extensibility through a vast array of plugins makes it adaptable to various environments and development workflows.\nThe initial configuration of Jenkins can pose challenges and is usually done in a manual manner. The manual nature of why its done that way is due to the way Jenkins grew over the years - previously, the whole concept of being able to configure entire IT infrasturucture that powers our app doesn\u0026rsquo;t exist. Only when tools such as Ansible and Terraform appeared on the scene did this concept become a pretty popular one. However, with something like Jenkins, which is run by numerous companies in the world, it\u0026rsquo;s important for them not to break anything - so they have to move slow when attempting to introduce any crazy new changes.\nAlthough it\u0026rsquo;s now somewhat possible to automate parts of initial Jenkins setup - it\u0026rsquo;s still pretty clunky. Hence, before we venture down that janky path, we would need to understand why the automating of setup of Jenkins is crucial in simplifying the management of CICD platforms for company. A janked/hacked system is usually a recipe for disaster for most teams since someone would eventually need to support that hack and they\u0026rsquo;ve got to also find new ways of doing the same thing in the case the software gets updated to the point that the hack goes way.\nOnce we automate the setup of Jenkins, it will now make it trivial to simply \u0026ldquo;toss out\u0026rdquo; the old server and recreate a new one since the whole thing is already codified. There is no/less fear to bootstrap or move Jenkins servers around since it can be simply be recreated from scratch.\nAutomating Jenkins Initial Configuration # I will be providing an example of a way to setup Jenkins in an automated way within a docker image. For this simple use case, I will simply be providing an extremely simple setup - it simply just echo out stuff. We won\u0026rsquo;t be covering things such as setup Jenkins slave etc or running Jenkins job within a docker runner etc.\nAs mentioned above, Jenkins is usually manually configured. Here are the list of things that we would need to do if we didn\u0026rsquo;t automate it:\nInstalling the relevant Jenkins plugins that we would use for our instance. Define pipelines that we would be able to use. Within Jenkins, we can point the pipelines to specific git repositories which can read Jenkinsfile (the pipeline definition file) that we can use for building our applications/run tasks on our infrastructure. Here is the repo and directory that will demonstrate this: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Environment/jenkins\nDo take note that the repo will continually update as time goes by - so there might be stuff within that repo that might not align so well with the contents in this blog.\nFirst, we would need a Dockerfile - within the Dockerfile, we would need to run the step install plugins.\nFROM jenkins/jenkins:latest COPY plugins.txt /var/jenkins_home/plugins.txt RUN /bin/jenkins-plugin-cli -f /var/jenkins_home/plugins.txt One of the important plugins that we would need to use would be the configuration as code (which is what we would want to do here). We should add that in plugins.txt. A large majority of the plugins listed in plugins.txt is the \u0026ldquo;default\u0026rdquo; set of plugins that is recommended to install if we went through the startup wizard during mnaual setup of Jenkins.\nace-editor apache-httpcomponents-client-4-api authentication-tokens blueocean blueocean-autofavorite blueocean-bitbucket-pipeline blueocean-commons blueocean-config blueocean-core-js blueocean-dashboard blueocean-display-url blueocean-events blueocean-git-pipeline blueocean-github-pipeline blueocean-i18n blueocean-jwt blueocean-personalization blueocean-pipeline-api-impl blueocean-pipeline-editor blueocean-pipeline-scm-api blueocean-rest blueocean-rest-impl blueocean-web bootstrap5-api branch-api caffeine-api checks-api cloudbees-bitbucket-branch-source cloudbees-folder credentials credentials-binding display-url-api durable-task echarts-api favorite font-awesome-api git git-client github github-api github-branch-source handy-uri-templates-2-api htmlpublisher jackson2-api javax-activation-api javax-mail-api jaxb jenkins-design-language jjwt-api jquery3-api jsch junit mailer matrix-project role-strategy okhttp-api pipeline-build-step pipeline-graph-analysis pipeline-groovy-lib pipeline-input-step pipeline-milestone-step pipeline-model-api pipeline-model-definition pipeline-model-extensions pipeline-stage-step pipeline-stage-tags-metadata plain-credentials plugin-util-api popper2-api pubsub-light scm-api script-security snakeyaml-api sse-gateway ssh-credentials structs token-macro trilead-api variant workflow-api workflow-basic-steps workflow-cps workflow-durable-task-step workflow-job workflow-multibranch workflow-scm-step workflow-step-api workflow-support configuration-as-code The next step would be to provide a yaml file that we can define some of the properties of the Jenkins. And another thing that we would need to do is also to tell Jenkins to skip the initial setup of the Jenkins server.\nFROM jenkins/jenkins:latest COPY plugins.txt /var/jenkins_home/plugins.txt RUN /bin/jenkins-plugin-cli -f /var/jenkins_home/plugins.txt COPY jenkins.yaml /var/jenkins_home/jenkins.yaml ENV JAVA_OPTS \u0026#34;-Djenkins.install.runSetupWizard=false ${JAVA_OPTS:-}\u0026#34; ENV CASC_JENKINS_CONFIG=/var/jenkins_home/jenkins.yaml This would be the configuration file that we would pass for configuration as code for Jenkins\njenkins: systemMessage: Jenkins managed via Configuration as Code securityRealm: local: allowsSignup: false users: - id: admin password: password authorizationStrategy: roleBased: roles: global: - name: \u0026#34;admin\u0026#34; description: \u0026#34;Jenkins administrators\u0026#34; permissions: - \u0026#34;Overall/Administer\u0026#34; entries: - user: \u0026#34;admin\u0026#34; - name: \u0026#34;readonly\u0026#34; description: \u0026#34;Read-only users\u0026#34; permissions: - \u0026#34;Overall/Read\u0026#34; - \u0026#34;Job/Read\u0026#34; entries: - user: \u0026#34;authenticated\u0026#34; crumbIssuer: \u0026#34;standard\u0026#34; credentials: system: domainCredentials: - credentials: - usernamePassword: scope: SYSTEM id: admin username: admin password: password With this, we can kind of start a Jenkins but it has no jobs that we can use. We should look further into this.\nIntroduce Jobs and Pipelines in our Configured Jenkins # To make our initially configured Jenkins useful, we would need to have it immediately have some pipelines jobs that we can immediately use. There are 2 things that we would need here:\nHave a Jobs DSL plugin installed on our Jenkins setup. We would then need to define the jobs to create pipelines. We would then set the jobs to be run from our configuration as code Jenkins configuration. This is needed since there doesn\u0026rsquo;t seem to be way to define pipelines straight from configuration as code. Create Jenkinsfile that would define our pipeline jobs that we would be using on our Jenkins server For the most simplest job - we would be loading a \u0026ldquo;job\u0026rdquo; that will define the pipeline in our docker image. It will not require to referring to other git systems to pull in any Jenkinsfile. For the Jenkinsfile, we will simply run echo commands:\npipeline { agent any stages { stage(\u0026#39;Build\u0026#39;) { steps { echo \u0026#39;Building..\u0026#39; } } stage(\u0026#39;Test\u0026#39;) { steps { echo \u0026#39;Testing..\u0026#39; } } stage(\u0026#39;Deploy\u0026#39;) { steps { echo \u0026#39;Deploying....\u0026#39; } } } } To define this pipeline, we would need the following job definition.\nString fileContents = new File(\u0026#39;/home/pipelines/firstjob.Jenkinsfile\u0026#39;).text pipelineJob(\u0026#34;firstjob\u0026#34;) { parameters { stringParam(\u0026#39;name\u0026#39;, \u0026#34;\u0026#34;, \u0026#39;name of the person\u0026#39;) } definition { cps { script(fileContents) sandbox() } } } We would then need to install the jobs-dsl plugin as well as add the above files in Dockerfile\nFROM jenkins/jenkins:latest COPY plugins.txt /var/jenkins_home/plugins.txt RUN /bin/jenkins-plugin-cli -f /var/jenkins_home/plugins.txt COPY jenkins.yaml /var/jenkins_home/jenkins.yaml ENV JAVA_OPTS \u0026#34;-Djenkins.install.runSetupWizard=false ${JAVA_OPTS:-}\u0026#34; ENV CASC_JENKINS_CONFIG=/var/jenkins_home/jenkins.yaml COPY jobs /home/jobs COPY pipelines /home/pipelines For the plugin text file\nace-editor apache-httpcomponents-client-4-api authentication-tokens blueocean blueocean-autofavorite blueocean-bitbucket-pipeline blueocean-commons blueocean-config blueocean-core-js blueocean-dashboard blueocean-display-url blueocean-events blueocean-git-pipeline blueocean-github-pipeline blueocean-i18n blueocean-jwt blueocean-personalization blueocean-pipeline-api-impl blueocean-pipeline-editor blueocean-pipeline-scm-api blueocean-rest blueocean-rest-impl blueocean-web bootstrap5-api branch-api caffeine-api checks-api cloudbees-bitbucket-branch-source cloudbees-folder credentials credentials-binding display-url-api durable-task echarts-api favorite font-awesome-api git git-client github github-api github-branch-source handy-uri-templates-2-api htmlpublisher jackson2-api javax-activation-api javax-mail-api jaxb jenkins-design-language jjwt-api jquery3-api jsch junit mailer matrix-project role-strategy okhttp-api pipeline-build-step pipeline-graph-analysis pipeline-groovy-lib pipeline-input-step pipeline-milestone-step pipeline-model-api pipeline-model-definition pipeline-model-extensions pipeline-stage-step pipeline-stage-tags-metadata plain-credentials plugin-util-api popper2-api pubsub-light scm-api script-security snakeyaml-api sse-gateway ssh-credentials structs token-macro trilead-api variant workflow-api workflow-basic-steps workflow-cps workflow-durable-task-step workflow-job workflow-multibranch workflow-scm-step workflow-step-api workflow-support configuration-as-code job-dsl For the configuration as code jenkins yaml file.\njenkins: systemMessage: Jenkins managed via Configuration as Code securityRealm: local: allowsSignup: false users: - id: admin password: password authorizationStrategy: roleBased: roles: global: - name: \u0026#34;admin\u0026#34; description: \u0026#34;Jenkins administrators\u0026#34; permissions: - \u0026#34;Overall/Administer\u0026#34; entries: - user: \u0026#34;admin\u0026#34; - name: \u0026#34;readonly\u0026#34; description: \u0026#34;Read-only users\u0026#34; permissions: - \u0026#34;Overall/Read\u0026#34; - \u0026#34;Job/Read\u0026#34; entries: - user: \u0026#34;authenticated\u0026#34; crumbIssuer: \u0026#34;standard\u0026#34; credentials: system: domainCredentials: - credentials: - usernamePassword: scope: SYSTEM id: admin username: admin password: password jobs: - file: /home/jobs/firstjob.groovy This should allow us to have a Jenkins docker image that immediately have first job on initial login.\nAdding a second job # Let\u0026rsquo;s add another job that will pull the Jenkinsfile from a repo. This would probably be more typical of some of the process - if we update Jenkinsfile, there is no need to recreate docker image for Jenkins - our pipeline should simply pull the new configuration for the pipeline without too much issue.\njenkins: systemMessage: Jenkins managed via Configuration as Code securityRealm: local: allowsSignup: false users: - id: admin password: password authorizationStrategy: roleBased: roles: global: - name: \u0026#34;admin\u0026#34; description: \u0026#34;Jenkins administrators\u0026#34; permissions: - \u0026#34;Overall/Administer\u0026#34; entries: - user: \u0026#34;admin\u0026#34; - name: \u0026#34;readonly\u0026#34; description: \u0026#34;Read-only users\u0026#34; permissions: - \u0026#34;Overall/Read\u0026#34; - \u0026#34;Job/Read\u0026#34; entries: - user: \u0026#34;authenticated\u0026#34; crumbIssuer: \u0026#34;standard\u0026#34; credentials: system: domainCredentials: - credentials: - usernamePassword: scope: SYSTEM id: admin username: admin password: password jobs: - file: /home/jobs/firstjob.groovy - file: /home/jobs/secondjob.groovy unclassified: # scmGit: # addGitTagAction: false # allowSecondFetch: false # createAccountBasedOnEmail: true # disableGitToolChooser: false # globalConfigEmail: jenkins@domain.local # globalConfigName: jenkins # hideCredentials: true # showEntireCommitSummaryInChanges: true # useExistingAccountWithSameEmail: false location: url: http://localhost:8090 adminAddress: admin@jenkins.com For the second job to create the second job\npipelineJob(\u0026#34;secondjob\u0026#34;) { parameters { stringParam(\u0026#39;name\u0026#39;, \u0026#34;\u0026#34;, \u0026#39;name of the person\u0026#39;) } definition { cpsScm { scm { git { remote { url(\u0026#39;https://github.com/hairizuanbinnoorazman/Go_Programming\u0026#39;) } branch(\u0026#39;master\u0026#39;) } } scriptPath(\u0026#39;Environment/jenkins/pipelines/secondjob.Jenkinsfile\u0026#39;) } } } For testing purposes, we can simply copy and paste the secondjob.Jenkinsfile from firstjob.Jenkinsfile.\nAlso, another thing that we would need to do would also be set and configure the git tool within the Jenkins docker server\nFROM jenkins/jenkins:latest COPY plugins.txt /var/jenkins_home/plugins.txt RUN /bin/jenkins-plugin-cli -f /var/jenkins_home/plugins.txt COPY jenkins.yaml /var/jenkins_home/jenkins.yaml ENV JAVA_OPTS \u0026#34;-Djenkins.install.runSetupWizard=false ${JAVA_OPTS:-}\u0026#34; ENV CASC_JENKINS_CONFIG=/var/jenkins_home/jenkins.yaml RUN git config --global user.email \u0026#34;you@example.com\u0026#34; \u0026amp;\u0026amp; \\ git config --global user.name \u0026#34;Your Name\u0026#34; COPY jobs /home/jobs COPY pipelines /home/pipelines With those files, now, we can have Jenkins server that has 2 possible jobs that we can use to run simple \u0026ldquo;echo\u0026rdquo; jobs.\nI will probably dive deeper into this setup to see how else we can extend the functionality of such automated Jenkins setup.\n","date":"3 January 2024","externalUrl":null,"permalink":"/automating-jenkins-initial-setup/","section":"Posts","summary":"Jenkins, a pretty popular Continuous Integration/Continuous Deployment (CI/CD) build tool, plays a pivotal role in automating the software development/deployment process. Over the years, Jenkins has evolved to become an extremely versatile automation server that facilitates continuous integration and delivery by orchestrating the building, testing, and deployment of code. Its extensibility through a vast array of plugins makes it adaptable to various environments and development workflows.\n","title":"Automating Jenkins Initial Setup","type":"posts"},{"content":" Introduction # When one mentions about application packaging - the usual first thought that can cross a person\u0026rsquo;s mind is how the application would be packaged in docker containers. That is a somewhat fair thing to think about - containers have gotten pretty common in developer circles. Tools such as docker or podman make it especially simple to write a simple straight forward file that would include their application file into a nice package. With this nice package - the people that are involved with running it production environments would only deal with a single artifact.\nThere are many other possible ways to package application. Another alternative way to package applications would be toss it into a Virtual Machine image. In the case where you use Amazon Web Services, you would copy the application and whatever necessary configuration into it. After the service is observed to be in a pretty decent state (running in a stable manner), we can simply shut off the instance and then \u0026ldquo;export\u0026rdquo; it as a Amazon Machine Image. In the case where us as users would need to run a single instance, we can simply request for AWS to use our Amazon Machine Image as the template virtual machine image and immediately start our application servers. There wouldn\u0026rsquo;t be any further need to install and copy our application binaries and configuration etc. One tool that can help with this is terraform - which is also another pretty popular tool when it comes creating virtual images. Sadly enough though, each cloud and each hypervisor has different formats for the image itself. In AWS - we would need AMI (Amazon Machine Images). In Google Cloud - we would have Compute Images -\u0026gt; they are all different from each other.\nThis time round, for this blog post, I will be focusing on one of the alternatives of packaging application which is via RPMs. RPMs is a common packaging format if we are to work with Centos OS or Red Hat Linux Distributions. These OS-es are often used in the enterprise world - so it\u0026rsquo;s pretty likely that you would come across it.\nBuilding a RPM with Golang application # For this blog post, I will be covering on how to build a RPM that would contain a Golang application. Upon install of RPM to a linux machine, it should be able to start the Golang application server and it should be managed by Systemd - there is a bunch of files that we would need to create as well a bunch of commands that we would need to run in order to get it running.\nIn order to build our RPM, we would need to create some sort of RPM spec file.\nName: basic Version: 0 Release: 1 Summary: RPM package to contain basic Golang app License: FIXME %description RPM package to encapsulate basic golang application %prep # we have no source, so nothing here %build # Built using Golang docker image %install mkdir -p %{buildroot}%{_bindir} mkdir -p %{buildroot}/etc/systemd/system/ install -m 755 app %{buildroot}%{_bindir}/app install -m 755 app.service %{buildroot}/etc/systemd/system/basic.service %files %{_bindir}/app /etc/systemd/system/basic.service %pre getent group app \u0026gt;/dev/null 2\u0026gt;\u0026amp;1 || groupadd app getent passwd app \u0026gt;/dev/null 2\u0026gt;\u0026amp;1 || useradd -G app app %post chown app:app %{_bindir}/app systemctl daemon-reload systemctl enable basic.service systemctl start basic.service %preun systemctl stop basic.service systemctl disable basic.service systemctl daemon-reload # systemctl reset-failed - not sure if needed here %postun userdel app groupdel app %changelog # let\u0026#39;s skip this for now In order to build our RPMs, we would need a Centos or Rocky or a somewhat similar OS. We need some of the tooling within it that would be used for packaging our RPM - however, by default, the default Centos or Rocky environments won\u0026rsquo;t come with language runtimes that we might need in order to build out and compile our application. In our case, we would need the Golang language runtime - and there doesn\u0026rsquo;t seem to be any convenient virtual machine or docker image that has Golang within a Centos environment.\nSince things are these way - the sane approach here is to simply rely multistage docker builds.\nFROM golang:1.18 as builder WORKDIR /helloworld ADD . . RUN CGO_ENABLED=0 go build -o app . FROM rockylinux:8 as rpm-builder RUN dnf install -y gcc rpm-build rpm-devel rpmlint make python3.11 bash diffutils patch rpmdevtools WORKDIR /helloworld COPY basic.spec . RUN rpmdev-setuptree COPY --from=builder /helloworld/app /root/rpmbuild/BUILD/app COPY ./deployment/bin/app.service /root/rpmbuild/BUILD/app.service RUN rpmbuild -ba basic.spec FROM scratch COPY --from=rpm-builder /root/rpmbuild/RPMS / The first part would simply rely on a Golang docker image that would simply focus on building out our Golang application into a static binary. The second part would build out our rpm. Our rpm would contain the compiled Golang application and the systemd configuration. Ideally, the built rpm should also have the capability to move the golang application to the right folder as well as to setup the systemd files to manage the golang application. In order to get the built RPM out would be to simply copy it to a scratch container and then to simply copy the RPMs to a folder within a scratch container to our host machine. We will run the above Dockerfile with the following command:\ndocker build -f \u0026lt;dockerfile location\u0026gt; -t rpmbuilder --output out . The output which is a folder that contains our RPM would be in the out folder.\nTesting the built RPM # We can test our built RPM on a virtual machine by going to any cloud provider to provision one. We can\u0026rsquo;t fully test it in a docker image since our RPM utilizes systemd. Systemd doesn\u0026rsquo;t exactly exist in container land (something along the line where systemd should on PID 1 but containers usually need the command defined via Entrypoint/CMD/from docker CLI instead)\nWe can simply do a scp to copy the RPM over to our server.\nOnce the RPM is on the machine, we can simply install the RPMs but running the following command:\nrpm -Uvh basic-0-1.x86_64.rpm To uninstall the rpm, we would simply need to list out what is installed on our server\nrpm -qa | grep basic And then, to remove it (\u0026ldquo;erase\u0026rdquo;)\nrpm -e basic-0.1.x86_64 THe above commands are simply examples - modify it according to the version that was specified for your rpm spec.\nCompute Engine VM to utilize RPMs from Artifact Registry # Naturally, once we have all these RPMs, it would ideal to store it someplace. We can technically store all of these RPMs in GCS and simply fetch it as file blobs and manually install it. However, yum/dnf does have a mechanism of being able to pull such rpms from some sort of repository. If there happens to be new versions, it would be able to calculate out that a new version is available for download and install. It would definitely be definitely to utilize that mechanism.\nGoogle Cloud has a location for that - Artifact Registry. We can set up a yum repository in it, and then configure the compute engine vm-s to install the rpms on the compute vm. We can create this repository via the UI on google console. Once the yum repository have been created, we can now push our rpm-s to it.\ngcloud artifacts yum upload demo --location=us-east1 --source=./out/x86_64/basic-0-1.x86_64.rpm Modify the above command to the location where the rpm is generated.\nNext step would be to create a Google Compute VM. Do note that it is important to provide sufficient priviliges in order to allow the VM to access the artifact registry. In the case where we\u0026rsquo;re using the default Google Compute Service account - ensure that we have enabled access to \u0026ldquo;Google Compute Platform\u0026rdquo;.\n# To configure your package manager with this repository: # Update Yum: sudo yum makecache # Install the Yum credential helper: sudo yum install dnf-plugin-artifact-registry # Configure your VM to access Artifact Registry packages using the following # command: sudo tee -a /etc/yum.repos.d/artifact-registry.repo \u0026lt;\u0026lt; EOL [demo] name=demo baseurl=https://us-east1-yum.pkg.dev/projects/healthy-rarity-238313/demo enabled=1 repo_gpgcheck=0 gpgcheck=0 EOL # Update Yum: sudo yum makecache New output for makecache command:\n$ sudo yum makecache Rocky Linux 8 - Cloud Kernel 37 kB/s | 3.4 kB 00:00 Rocky Linux 8 - AppStream 28 kB/s | 4.8 kB 00:00 Rocky Linux 8 - BaseOS 45 kB/s | 4.3 kB 00:00 Rocky Linux 8 - Extras 29 kB/s | 3.1 kB 00:00 demo 3.1 kB/s | 967 B 00:00 Google Compute Engine 12 kB/s | 1.4 kB 00:00 Google Cloud SDK 37 kB/s | 1.4 kB 00:00 Metadata cache created. Now we can try to install it:\n$ sudo dnf install basic Error: This command has to be run with superuser privileges (under the root user on most systems). [hairizuan@instance-1 ~]$ sudo dnf install basic Last metadata expiration check: 0:00:46 ago on Sat Dec 23 02:22:21 2023. Dependencies resolved. ========================================================================================================================= Package Architecture Version Repository Size ========================================================================================================================= Installing: basic x86_64 0-1 demo 1.8 M Transaction Summary ========================================================================================================================= Install 1 Package Total download size: 1.8 M Installed size: 4.9 M Is this ok [y/N]: In the case we need to update rpm-s - where we update the basic rpm to a later version and make it available on artifact registry\ndnf update basic Once we run the update on the repo-s, we can simply update it by running the install command:\ndnf install basic To remove our package, we can simply remove it with the following command:\ndnf remove basic Conclusion # The above RPM being created from this post is pretty simple and doesn\u0026rsquo;t cover many of the features that RPMs would generally cover. One can simply look at the below link from pld-linux github link which seems to provide many other rpm spec files for many of the rpms available in the yum repos.\nMaybe in the future, I\u0026rsquo;ll write another blog post if I come across an interesting feature while building RPMs.\nReferences:\nSome of the macros available in rpm spec\nhttps://docs.fedoraproject.org/en-US/packaging-guidelines/RPMMacros/ https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#_syntax Some examples of rpm spec files.\nhttps://github.com/pld-linux Maximum RPM guide book http://rpm5.org/docs/max-rpm.html ","date":"27 December 2023","externalUrl":null,"permalink":"/building-rpms-and-storing-it-in-artifact-registry/","section":"Posts","summary":"Introduction # When one mentions about application packaging - the usual first thought that can cross a person’s mind is how the application would be packaged in docker containers. That is a somewhat fair thing to think about - containers have gotten pretty common in developer circles. Tools such as docker or podman make it especially simple to write a simple straight forward file that would include their application file into a nice package. With this nice package - the people that are involved with running it production environments would only deal with a single artifact.\n","title":"Building RPMs and storing it in Artifact Registry","type":"posts"},{"content":"I have a basic shopping list application that is available in the following code base: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Apps/shopping-list. This is a simple Golang application that also embeds a generated javascripts that has been transpiled into Javascript files. We can then embed the required CSS, Javascript and HTML files that would be the frontend of the shopping list. The frontend would then call some backend apis that would simply store shopping list items into some form of datastore - which in this case, is Google Cloud Datastore (a NoSQL database)\nTLDR version:\nFrontend ELM -\u0026gt; transpiled to Javascript CSS HTML Backend Golang Frontend is embed into Golang Application is baked into a Docker image Database is Google Cloud Datastore Deployment Deployed on Google Cloud Run in Singapore region (GCP) CICD pipelines are setup in Github actions Manual deployment # Before any CICD pipeline was created, the application was previously deployed manually from a developer\u0026rsquo;s workstation. The commands can be sometimes be rather long and complicated; and hence, makefiles was created which can then be used to simplify the commands that is being used to deploy the application.\nIn order to get the application to \u0026ldquo;production\u0026rdquo;, we would need to run various steps that would prepare various files that would need to be included into our docker image. Do take note that our frontend for the shopping list is written using Elm. We do not use Elm directly; we would actually need to take the Elm code base and use it to generate the javascript (that browsers understand) from it. The clean javascript code generated via the Elm code base would then need to be \u0026ldquo;uglified\u0026rdquo; and \u0026ldquo;minified\u0026rdquo; so as to reduce the chances of attackers attempting to attack the backend by understanding the frontend codebase.\nThis step is all encapsulated in one make command called:\nmake gen-prod This command generates our uglified, minified javascript and move it to a static folder. This step is vital as we would then have our production docker image load it in and use it as part of the static binary build. The make command to build out the production docker image would be:\nmake docker-prod This make command would create our needed docker image and push it into a container registry (which is deprecated - remember this for later)\nThere is no actual cli command to create/update the Cloud Run service. This was done manually via UI (usually used for demo purposes)\nUsing CICD Github actions # This worked normally since there is only 1 developer working on it after all (me) but naturally, it would be great to have some sort of CICD pipeline created for it. However, I really would like to replicate the same experience of myself writing this blog as to us, getting the shopping-list application straight to production. I would want processes where upon any code changes - the relevant artifacts would be built and pushed to its respective registries and the Cloud Run service would be updated to the latest image being built.\nTLDR for the github actions workflow file is here: https://github.com/hairizuanbinnoorazman/Go_Programming/blob/master/.github/workflows/shopping.yml\nSeeing that the whole of Golang Programming github repository is on Github - we might as well simply rely on Github actions to do so. (which sounds pretty possible). There are a few things that we would need to tackle to get the whole CI CD pipeline working.\nTest the code (maybe next time?) Build the docker image from the code Push the docker image into a Private registry Update our Cloud Run instance to use the new docker image For now, we would skip the first portion - might be good to consider adding it next time to make the entire process \u0026ldquo;safer\u0026rdquo;. The first step would involve running the server in a temporary manner and then hammering it with curl requests. This is mostly to check that the application would be able to store as well as return data records.\nThe next step is to build the docker image as well as to push the image to a registry. My initial approach to this is to simply follow the manual approach of pushing the docker image into the registry. However, this would require our github actions builder to have the same permissions as my workstation to push the docker image. We would then run the step to modify the docker cli to add mentioned credentials so that our docker cli would be able to push the image to the private Container registry on Google Cloud.\nThere are 2 problems here. The first is that we would need to figure out how to pass credentials to our github actions builder. In our workstation, we can simply run gcloud auth login and it would then prompt us in an interactive fashion to authenticate our gcloud command. We definitely cannot do this on github actions since it\u0026rsquo;s mostly a non-interactive environment (also, imaginge how irritating it would be to almost be prompted by some machine to accept that you\u0026rsquo;re trying to authenticate the gcloud command from some machine that you control)\nOne solution to solve this is to create a json credentials file from a newly created service account that has all the permissions we need. The json credentials file can be stored on github actions secrets variable and can then be loaded into our github actions job. However, from sniffing around documentation - this is not the best \u0026ldquo;recommended\u0026rdquo; approach to get this working. The argument here is that we now have a credentials file that we would need to manage - this credentials file would potentially be very very powerful and might alter large swathes of our application infrastructure. That wouldn\u0026rsquo;t be ideal.\nThe second approach for this is to try a relatively new approach: Workload Identity Federation. We are extending our credentials from Google Cloud into Github\u0026rsquo;s auth service via OIDC. (I\u0026rsquo;m not too sure about the details of how it works - might be worth an entire blog post to try to understand how OIDC works). Our github actions runner would present certain details that only it knows to our Google Cloud account - once all of presented information is correct (e.g. correct github user or correct github repository), we would then authorize the runner to perform all the actions that it needs to do to update our cloud run service.\nHere might be some useful resources to find out more on workload identity federation\nhttps://cloud.google.com/blog/products/identity-security/enabling-keyless-authentication-from-github-actions https://cloud.google.com/iam/docs/workload-identity-federation-with-other-providers https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect#configuring-the-subject-in-your-cloud-provider https://github.com/google-github-actions/auth The rough gist of steps that I took for setting up CI CD with github actions with workload identity federation is as follows:\nCreate a pool for Workload Identity Federation Create a service account with the following permissions: Service account token creation Service account OIDC token creation (require confirmation) Cloud Run Developer (permissions to update our Cloud Run registry) Artifact Registry Writer (permissions to update Artifact Registry) Service Account User (without it, we would be unable to replace Cloud Run service surprisingly) Grant access to the created pool for the Workload Identity Federation via Service Account. We don\u0026rsquo;t need to save the file - can simply skip the \u0026ldquo;configuration\u0026rdquo; file that would inform client libraries how to connect to workload identity Save the \u0026ldquo;subject\u0026rdquo; for service account mapping to: \u0026ldquo;repo:hairizuanbinnoorazman/Go_Programming:ref:refs/heads/master\u0026rdquo; The second problem is with regards with IAM permissions. Surprisingly, it seems that there is no proper IAM permsissions that we can rely on to push the built docker images to container registries. There are stack overflow posts that mentioned that we can simply add the entire Cloud Storage Admin IAM permissions - but apparently, there doesn\u0026rsquo;t work as expected. There could be some hidden tricks that we would probably need to turn on some configuration on the Google Cloud console. After tweaking around, the only way for me to get docker images into container registry is to use Editor permsisions (which is a big no-no here). However, seeing that container registry is already a sunsetting tool, it might be better to simply just move on and proceed to use Artifact Registry - which convenieintly enough, has specific IAM permissions.\nAll of the above explanation is mostly for authentication and permissions which can all be summed in a single github actions step.\n- id: \u0026#39;auth\u0026#39; name: \u0026#39;Authenticate to Google Cloud\u0026#39; uses: \u0026#39;google-github-actions/auth@v2\u0026#39; with: workload_identity_provider: \u0026#34;projects/${{ secrets.GCP_PROJECT_ID }}/locations/global/workloadIdentityPools/hairizuan-personal-github/providers/golang-programming\u0026#34; service_account: \u0026#39;github-actions@${{ secrets.GCP_PROJECT }}.iam.gserviceaccount.com\u0026#39; Another critical step in this workflow would be the step to get docker to have the credentials to be able to push our built docker images into our private artifact registry.\n- name: \u0026#34;Docker auth\u0026#34; run: |- gcloud auth configure-docker asia-southeast1-docker.pkg.dev --quiet The rest of the steps are not too important to mention here - they are somewhat close to direct commands that we would throw into a terminal such as building docker images and pushing it into the registry as well as using the gcloud cli command tool to replace the image being used for our shopping list application.\n","date":"20 December 2023","externalUrl":null,"permalink":"/github-actions-for-shopping-list-application/","section":"Posts","summary":"I have a basic shopping list application that is available in the following code base: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Apps/shopping-list. This is a simple Golang application that also embeds a generated javascripts that has been transpiled into Javascript files. We can then embed the required CSS, Javascript and HTML files that would be the frontend of the shopping list. The frontend would then call some backend apis that would simply store shopping list items into some form of datastore - which in this case, is Google Cloud Datastore (a NoSQL database)\n","title":"Github actions for shopping list application","type":"posts"},{"content":"","date":"20 December 2023","externalUrl":null,"permalink":"/categories/serverless/","section":"Article Categories","summary":"","title":"Serverless","type":"categories"},{"content":"","date":"20 December 2023","externalUrl":null,"permalink":"/tags/serverless/","section":"Technology Tags","summary":"","title":"Serverless","type":"tags"},{"content":"Over the past few months, I have been toying with the idea of going all in with Ansible or all in with Terraform. Both tools are pretty popular tools when it comes to application and tools deployment. After tinkering around, I eventually somewhat come to conclusion where Terraform would be the \u0026ldquo;better\u0026rdquo; tool here. The main reason for this all comes down to this: https://github.com/ansible-collections/google.cloud/issues/301 - it seems that Ansible is not as \u0026ldquo;supported\u0026rdquo; as Terraform - and the more it seems that there are certain features that I may want to use to be missing. Rather than continue tinkering and hoping that something would happen (sometimes, these kind of code would never be resolved/fixed - it\u0026rsquo;s possible for me to dig into it to try to solve but I don\u0026rsquo;t feel like investing into this particular tool in depth)\nAnd here we are, me trying out Terraform in order to deploy some stuff in Google Cloud via Terraform.\nInitial setup # One of the first things to build out would be the following setup:\nSetup of instances within a private virtual network Generally done for better security - one way to secure servers would simply to avoid outsidrs from unnecessarily accessing it Setup of bastion host This should be publicly accessible via ssh. This host would serve as a \u0026ldquo;jump\u0026rdquo; server that would allow an outside developer to access the various instances within the private network Ensure that instance within the private virtual network is able to access the internet If one were to create a Google Compute Engine with no public interface, you would realize that the instance wouldn\u0026rsquo;t be able to access the internet. In order for such instances to be able access the internet, we would need to setup a NAT connected to the VPN - from then on, it should be able to access the internet. The first thing I would want to deploy in order to ensure that the internet works properly would be to install docker on the instance within the virtual private network. I generally use docker in various experiments so it would usually be the first things I think of to need to deploy contantly.\nThe code for this is here:\nhttps://github.com/hairizuanbinnoorazman/terraform/tree/main/google\nThe portion to deploy the bastion host would be here:\nhttps://github.com/hairizuanbinnoorazman/terraform/tree/main/google/bastion\nWe don\u0026rsquo;t need to run any further commands on bastion host as of now (maybe in the future, I will research into what tooling to install on bastion host in order to harden it against attackers). As of now, I didn\u0026rsquo;t look too deep into bastion host tooling since I would generally bootstrap and tear down the entire stack within the day.\nThe next critical piece of thing to cover would be the step on how to install on docker on the server. We would a way for us to inject it via terraform. One approach, especially in the case where we use Google Cloud instances would be to rely on the metadata_startup_script - there is service within each instance that would immediately run upon instance startup - which we would want in this case.\nresource \u0026#34;google_compute_instance\u0026#34; \u0026#34;server\u0026#34; { count = var.service_meta[var.component].server_count zone = var.gcp_zone ... metadata_startup_script = data.local_file.script.content ... } We can simply read of the script from another file rather than putting it all within terraform scripts itself. For the case of docker, it is in the following file: https://github.com/hairizuanbinnoorazman/terraform/blob/main/google/server/scripts/docker. The file would be read via terraform\u0026rsquo;s local module which would then have that content be piped into the google_compute_instance module when starting our private server.\nWith that all in place, we can simply run the commands from the google folder of the git repo:\n# If not yet initialized: terraform init # Convenience function for creating a docker with bastion host components=\u0026#39;[\u0026#34;docker\u0026#34;]\u0026#39; add_bastion=true make plan Accidentally added a bunch of complex features where a make command would be needed. The make command essentially is wrapper around terraform plan; flags such as where the output of the plan would be is already decleared within the makefile. Other variables that need to be declared would also be things like which gcp project we would like to aim this deployment at.\ngcp_id=$(shell gcloud config get project) components?=[] add_bastion?=false plan: TF_VAR_gcp_project_id=$(gcp_id) terraform plan -out=initial.plan -var \u0026#39;components=$(components)\u0026#39; -var \u0026#39;enable_bastion=$(add_bastion)\u0026#39; destroy: TF_VAR_gcp_project_id=$(gcp_id) terraform plan -out=destroy.plan -destroy Once the plan is created, we can simply run the command:\nterraform apply initial.plan What\u0026rsquo;s available # If you see the codebase as of now, it has already been configured to be able to deploy a variety of common tools and services such as nginx, etcd, mariadb. The full list of what\u0026rsquo;s probably capable will be continually updated on the main Readme.md page of the git repo.\nEventually, I would cover other cases as well, such as deploying custom golang applications or ruby applications or even python applicaitons into the various environments - everything controlled or handled via terraform. Look forward to that.\n","date":"13 December 2023","externalUrl":null,"permalink":"/using-terraform-for-deploying-databases-and-applications-in-google-cloud/","section":"Posts","summary":"Over the past few months, I have been toying with the idea of going all in with Ansible or all in with Terraform. Both tools are pretty popular tools when it comes to application and tools deployment. After tinkering around, I eventually somewhat come to conclusion where Terraform would be the “better” tool here. The main reason for this all comes down to this: https://github.com/ansible-collections/google.cloud/issues/301 - it seems that Ansible is not as “supported” as Terraform - and the more it seems that there are certain features that I may want to use to be missing. Rather than continue tinkering and hoping that something would happen (sometimes, these kind of code would never be resolved/fixed - it’s possible for me to dig into it to try to solve but I don’t feel like investing into this particular tool in depth)\n","title":"Using Terraform for deploying databases and applications in Google Cloud","type":"posts"},{"content":"","date":"1 November 2023","externalUrl":null,"permalink":"/categories/microservices/","section":"Article Categories","summary":"","title":"Microservices","type":"categories"},{"content":"","date":"1 November 2023","externalUrl":null,"permalink":"/tags/microservices/","section":"Technology Tags","summary":"","title":"Microservices","type":"tags"},{"content":"","date":"1 November 2023","externalUrl":null,"permalink":"/categories/nginx/","section":"Article Categories","summary":"","title":"Nginx","type":"categories"},{"content":"","date":"1 November 2023","externalUrl":null,"permalink":"/tags/nginx/","section":"Technology Tags","summary":"","title":"Nginx","type":"tags"},{"content":"On virtual machine How to \u0026ldquo;protect\u0026rdquo; api requests https://www.nginx.com/blog/deploying-nginx-plus-as-an-api-gateway-part-1/\nMostly is the auth_request directive\nMicroservices are a software architectural style that structures an application as a collection of loosely coupled, independently deployable services. Each service in a microservices architecture represents a specific business capability and communicates with other services through well-defined APIs (Application Programming Interfaces). These services are designed to be small, focused, and can be developed, deployed, and scaled independently. Its a somewhat common architectural pattern that many companies go to when it comes to scaling out their development teams to build out their product.\nWhile microservices offer several advantages, managing communication and interaction between them can become complex as the number of services increases. This is where an API Gateway becomes crucial. Some of the advantages that come with introducing API Gateway would be:\nUnified Entry Point Protocol Translation Security and Authentication Load Balancing Monitoring and Analytics For this blog post, let\u0026rsquo;s explore how we can add nginx to a bunch of services and then, tackle the authentication aspect of securing services. Out of convenience, we would set up our applications and nginx via docker containers. The docker containers would orchestrated and composed up with the docker compose tool.\nMain application # Our main application would simply return a small text response and a 200 ok response. We would have only one root endpoint that would respond to any request.\npackage main import ( \u0026#34;log\u0026#34; \u0026#34;net/http\u0026#34; \u0026#34;github.com/gorilla/mux\u0026#34; ) type basic struct{} func (b basic) ServeHTTP(w http.ResponseWriter, r *http.Request) { log.Println(\u0026#34;started basic handler\u0026#34;) defer log.Println(\u0026#34;ended basic handler\u0026#34;) w.Write([]byte(\u0026#34;successfully called basic handler\u0026#34;)) } func main() { log.Print(\u0026#34;App started\u0026#34;) r := mux.NewRouter() r.Handle(\u0026#34;/\u0026#34;, basic{}) srv := http.Server{ Handler: r, Addr: \u0026#34;0.0.0.0:8080\u0026#34;, } log.Fatal(srv.ListenAndServe()) } The docker image for it would it would be something like so:\nFROM golang:1.21 as builder WORKDIR /helloworld COPY . . RUN CGO_ENABLED=0 go build -o app ./cmd/app FROM debian:bookworm-slim RUN apt update \u0026amp;\u0026amp; \\ apt install -y ca-certificates \u0026amp;\u0026amp; \\ apt clean \u0026amp;\u0026amp; \\ rm -rf /var/lib/apt/lists/* WORKDIR /helloworld COPY --from=builder /helloworld/app /helloworld/app CMD [\u0026#34;/helloworld/app\u0026#34;] EXPOSE 8080 For the docker image, we would build the binary and then, the binary would simply be copied over to a debian image.\nWe can test our application by simply starting our docker image and testing againt our root endpoint:\ncurl localhost:8080/ Auth application # Our auth application would provide a few endpoints:\n/ - A root endpoint that would provide a webpage that would provide a form where we can put in username and password /signin - An endpoint that would check the username and password input. If the username and password is not correct - it would return a 403 unauthorized response. /auth - This is simply endpoint that would check that a cookie is set. If the cookie is set, that would mean that the user/browser is \u0026ldquo;valid\u0026rdquo;. Normally, we would need to check that the user is valid and still authenticated. This would be the golang application:\npackage main import ( \u0026#34;log\u0026#34; \u0026#34;net/http\u0026#34; \u0026#34;text/template\u0026#34; \u0026#34;github.com/gorilla/mux\u0026#34; ) type signinPage struct{} func (b signinPage) ServeHTTP(w http.ResponseWriter, r *http.Request) { log.Println(\u0026#34;started signin-page handler\u0026#34;) defer log.Println(\u0026#34;ended signin-page handler\u0026#34;) tmpl := template.Must(template.ParseFiles(\u0026#34;layout.html\u0026#34;)) tmpl.Execute(w, nil) } type signin struct{} func (b signin) ServeHTTP(w http.ResponseWriter, r *http.Request) { log.Println(\u0026#34;started signin handler\u0026#34;) defer log.Println(\u0026#34;ended signin handler\u0026#34;) name := r.FormValue(\u0026#34;name\u0026#34;) password := r.FormValue(\u0026#34;password\u0026#34;) if name == \u0026#34;admin\u0026#34; \u0026amp;\u0026amp; password == \u0026#34;password\u0026#34; { cookie := http.Cookie{ Name: \u0026#34;test\u0026#34;, Value: \u0026#34;test-cookie\u0026#34;, Path: \u0026#34;/\u0026#34;, } http.SetCookie(w, \u0026amp;cookie) w.Write([]byte(\u0026#34;successfully login\u0026#34;)) return } w.WriteHeader(http.StatusUnauthorized) w.Write([]byte(\u0026#34;unauthorized login\u0026#34;)) } type auth struct{} func (a auth) ServeHTTP(w http.ResponseWriter, r *http.Request) { _, err := r.Cookie(\u0026#34;test\u0026#34;) if err == nil { log.Println(\u0026#34;cookie found, will return 200 ok\u0026#34;) w.WriteHeader(http.StatusOK) w.Write([]byte(\u0026#34;cookie found - successfully in\u0026#34;)) return } w.WriteHeader(http.StatusUnauthorized) w.Write([]byte(\u0026#34;invalid\u0026#34;)) } func main() { log.Print(\u0026#34;Auth started\u0026#34;) r := mux.NewRouter() r.Handle(\u0026#34;/\u0026#34;, signinPage{}) r.Handle(\u0026#34;/signin\u0026#34;, signin{}) r.Handle(\u0026#34;/auth\u0026#34;, auth{}) srv := http.Server{ Handler: r, Addr: \u0026#34;0.0.0.0:8080\u0026#34;, } log.Fatal(srv.ListenAndServe()) } The frontend part that would allow us to key in username and password would be:\n\u0026lt;html\u0026gt; \u0026lt;head\u0026gt;\u0026lt;/head\u0026gt; \u0026lt;body\u0026gt; \u0026lt;h1\u0026gt;Sign In Page\u0026lt;/h1\u0026gt; \u0026lt;form action=\u0026#34;/api/v1/auth/signin\u0026#34; method=\u0026#34;post\u0026#34;\u0026gt; \u0026lt;label for=\u0026#34;name\u0026#34;\u0026gt;Name:\u0026lt;/label\u0026gt;\u0026lt;br\u0026gt; \u0026lt;input type=\u0026#34;text\u0026#34; id=\u0026#34;name\u0026#34; name=\u0026#34;name\u0026#34;\u0026gt;\u0026lt;br\u0026gt; \u0026lt;label for=\u0026#34;password\u0026#34;\u0026gt;Password:\u0026lt;/label\u0026gt;\u0026lt;br\u0026gt; \u0026lt;input type=\u0026#34;password\u0026#34; id=\u0026#34;password\u0026#34; name=\u0026#34;password\u0026#34;\u0026gt; \u0026lt;input type=\u0026#34;submit\u0026#34; value=\u0026#34;Submit\u0026#34;\u0026gt; \u0026lt;/form\u0026gt; \u0026lt;/body\u0026gt; \u0026lt;/html\u0026gt; The docker image for our auth application\nFROM golang:1.21 as builder WORKDIR /helloworld COPY . . RUN CGO_ENABLED=0 go build -o app ./cmd/auth FROM debian:bookworm-slim RUN apt update \u0026amp;\u0026amp; \\ apt install -y ca-certificates \u0026amp;\u0026amp; \\ apt clean \u0026amp;\u0026amp; \\ rm -rf /var/lib/apt/lists/* WORKDIR /helloworld COPY --from=builder /helloworld/app /helloworld/app COPY ./cmd/auth/layout.html /helloworld/layout.html CMD [\u0026#34;/helloworld/app\u0026#34;] EXPOSE 8080 With that, we can set up the docker container. We can then use the browser to check that the auth application would work. We can go through the endpoints in the following order.\nGo to / endpoint. It will render a html page to allow user to insert username and password. We can submit the form that would send user to the /sigin endpoint Go to /signin endpoint. This endpoint will compare username and password via some logic. This would return a cookie to the browser Go to /auth endpoint that would simply check the cookie is setup. Setting up entire application stack # Once we have applications available, we can setup all our containers via docker compose tool.\nversion: \u0026#39;3.3\u0026#39; services: app: build: context: . dockerfile: app.Dockerfile restart: always auth: build: context: . dockerfile: auth.Dockerfile restart: always fw: image: nginx:1.25.3 ports: - 8080:80 restart: always volumes: - type: bind source: ./conf target: /etc/nginx/conf.d/ read_only: true For the nginx configuration, we can use the following configuration.\nserver { listen 80; listen [::]:80; server_name localhost; #access_log /var/log/nginx/host.access.log main; location ~ ^/api/v1/basic/ { auth_request /auth; rewrite ^/api/v1/basic(.*) $1 break; proxy_pass http://app:8080; proxy_set_header Host $http_host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } location = /auth { internal; proxy_pass http://auth:8080; proxy_pass_request_body off; proxy_set_header Content-Length \u0026#34;\u0026#34;; proxy_set_header X-Original-URI $request_uri; } location = /api/v1/auth/auth { return 404; } location ~ ^/api/v1/auth/ { rewrite ^/api/v1/auth(.*) $1 break; proxy_pass http://auth:8080; proxy_set_header Host $http_host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } location / { root /usr/share/nginx/html; index index.html index.htm; } #error_page 404 /404.html; # redirect server error pages to the static page /50x.html # error_page 500 502 503 504 /50x.html; location = /50x.html { root /usr/share/nginx/html; } # proxy the PHP scripts to Apache listening on 127.0.0.1:80 # #location ~ \\.php$ { # proxy_pass http://127.0.0.1; #} # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000 # #location ~ \\.php$ { # root html; # fastcgi_pass 127.0.0.1:9000; # fastcgi_index index.php; # fastcgi_param SCRIPT_FILENAME /scripts$fastcgi_script_name; # include fastcgi_params; #} # deny access to .htaccess files, if Apache\u0026#39;s document root # concurs with nginx\u0026#39;s one # #location ~ /\\.ht { # deny all; #} } Some of the important aspects of our the nginx\n/api/v1/basic/ would direct users to the main application - done via rewrite directive. Do take note of the trailing slash - without the trailing slash, it would return a 404 error The /api/v1/basic uses the auth_request directive. This directive would do a quick check against the auth application to ensure that the user is still validated. /api/v1/auth/ would direct users to the auth application - done via rewrite directive. Do take note of the trailing slash - without the trailing slash, it would return a 404 error Allow users to access /signin and / paths from the auth application. These are accessed via endpoint /api/v1/auth/sign and /api/v1/auth/.We would only use the /auth if we\u0026rsquo;re accessing the main application\u0026rsquo;s endpoint. (Technically it would be accessed via /api/v1/auth/auth) The /auth endpoint would check against the /auth endpoint of the auth application. ","date":"1 November 2023","externalUrl":null,"permalink":"/nginx-as-api-gateway-focusing-on-auth_request-directive/","section":"Posts","summary":"On virtual machine How to “protect” api requests https://www.nginx.com/blog/deploying-nginx-plus-as-an-api-gateway-part-1/\nMostly is the auth_request directive\nMicroservices are a software architectural style that structures an application as a collection of loosely coupled, independently deployable services. Each service in a microservices architecture represents a specific business capability and communicates with other services through well-defined APIs (Application Programming Interfaces). These services are designed to be small, focused, and can be developed, deployed, and scaled independently. Its a somewhat common architectural pattern that many companies go to when it comes to scaling out their development teams to build out their product.\n","title":"Nginx as API Gateway - focusing on auth_request directive","type":"posts"},{"content":" General framework for system design interviews # From the following website: https://www.youtube.com/watch?v=i7twT3x5yv8\nSpecify Requirements\nDesign High Level Functional Components\nDeep dive to specific \u0026ldquo;interesting\u0026rdquo; pieces of the components\nWrap up\nUsual points:\nRequest for estimated users for the application to determine scale of the application To help find out ratio of reader to writers. Any \u0026ldquo;feed\u0026rdquo; need to ask about \u0026ldquo;freshness\u0026rdquo; of data. How frequently does it need to be updated? (But does this matter here?) Technical requirements: Availability of system Ok for data to be eventually consistent or is consistent data a requirement here? Latency of content to be distributed Fault tolerance (able to withstand failures) Use Queue for potentially slow process or processes that may suddenly spike in requirements Design an \u0026ldquo;Instagram\u0026rdquo; app # Business requirements User has a home screen which provides a feed of photos/videos of other people. Assumed that priority for the feed would be people \u0026ldquo;close\u0026rdquo; to the user User submits photos or videos and adds metadata to such assets when posting. E.g. tags User is able to vote for which photo they like best which would help identify which content is \u0026ldquo;good\u0026rdquo; content User is able to \u0026ldquo;follow\u0026rdquo; other fellow users Is this a global app? Or just simply a regional app? It is assumed that the model of the this \u0026ldquo;instagram\u0026rdquo; would have the same business model - serve Ads to make money Technical requirements How many users are expected to use the app on daily basis? When accessing content, can it be assumed that we would want to reduce content sizes for users - in order to improve the responsiveness of the app and reduce amount of bandwidth needed to show the content? High level components User service. Manages user information as well as the closeness of users to other users. Track the followers information. Content Submission service. Handles content being submitted by user. Involves processing the content for storage as well as future service as needed. Feed generation service. Involves in generating the feed that each user would be consuming. Some of the information that it requires to generate the feed would be the closeness of the user to other relevant user, recency of the post, the amount of likes of the post etc. Feed generation service would probably use an algorithm/ML to train on all these data to determine which feed would be best served to provide the greatest revenue/engagement. Metrics absorption service. Collect business metrics on how posts are engaged by users. Information such as: App usage time Number of posts user went through Time spent on each post etc Ads service. Involves with providing ads purchased by individuals/companies. Consumed within the feed. Design a live-streaming application # TODO: Read up further on this\nRequirements Take video stream from user\u0026rsquo;s web camera + microphone from browser/app and encode the data that is for live streaming User is able to select the stream that is to be watched User is able to download the right bitsized video for best viewing experience (e.g. handphone has smallest screen and doesn\u0026rsquo;t need to watch it in \u0026ldquo;True HD\u0026rdquo; fashion.) Technical Requirements Low latency between time when video inputs is captured to when viewers are watching the stream Video stream is available globally to everyone at the same time High level components Client application (browser) that is able to access the camera + microphone to record information. It will then encode the data as RTMP/RTMPS to send the data over to the server. Transcoding component. Component that will take RTMP input and then run appropiate manipulations on it to downsize the data accordign to different bitrates. Apparently, ffmpeg seems to be able to immediately do such computations in a single jump. Client application (viewer) to view content via HLS format. Design a key value store # TODO: Research on the following:\nStorage Engine: SStable, Bloom Filters, Btrees? (But sounds like more for relational db-s?)\nRequirements\nTo store set of keys mapped to its values in a persistent fashion. Data is not lost on shut down. Able to take in a moderate number of connections with no issues when pushing/getting data Data stored on disk cannot be in clear text format Able to operate as a cluster Technical requirements\nLow latency when retrieving and saving data in the datastore High level components:\nCluster components: Leader election. Needed when clusters are formed. Certain operations where only 1 operation can succeed requires leader to be available and to decide what to do next. Data moving subcomponent. Move data between nodes in the cluster Partitioning of data component. In this case, maybe consistent hashing is the best solution to prevent so much data from moving around. Memberlist. To see who\u0026rsquo;s part of it and who\u0026rsquo;s not. If no longer healthy, need to start moving data around. Query Engine: Allow capability to handle further complex requests from clients Storage Engine: Consists of 3 things. Commit log, Memory Cache and Write to Disk Design Tiktok # Requirements User is able to submit in short clips of videos User is able to edit the short clips of videos User is able to follow other users to see what content they post etc User able to interact with the comment by liking or commenting on it User will be on a feed that will provide a list of short form videos that they will view User will be served ads in order to make money for the application There will be metrics that would collect business metrics based on how users interact with the application Technical requirements Short videos served would need to be served at low latency with low bandwidth usage Amount of time between upload and content availability should be low? High level components User service (deal with users following other users etc) Ads service Video submission service Video viewing service Video editing service Metrics service Feed generation service Design a global accessible database # TODO: Further research needed\nView videos presented at Re:invent conferences etc on how other companies are doing it?\nhttps://youtu.be/ilgpzlE7Hds?t=1882\nVitess architecture + Planetscale\nRequirements:\nReads for certain data such as user data would need to be global. Need to find out where each user is from Ok for some of the data to be regional. It is assumed that most data is regional in nature Replication of user data doesn\u0026rsquo;t need to be immediate. It is ok for this data to be eventuall consistent. (But the data needs to be highly available as users can access the service any time) Writes for \u0026ldquo;other\u0026rdquo; data needs to be done with low latency (as much as possible) It is assumed that there is high ratio of reads:writes High level components\nDistributed multi-regional nosql db Set the heartbeat interval to be pretty high - since we\u0026rsquo;re having databases talking across huge regions Regional relational database clusters For storing user data and other data Cluster is needed in order to allow for high reads to writes. Reads done for replicas while writes done for the main primary Initial location of where the user creates his/her account would be the region where this would be first created Metrics service Collect latency information of how slow users accessing their data is. If it\u0026rsquo;s slow and its determined that they\u0026rsquo;ve mostly started accessing their data from a different region - then begin propagating the data to another region? ","date":"18 October 2023","externalUrl":null,"permalink":"/system-design-notes/","section":"Posts","summary":"General framework for system design interviews # From the following website: https://www.youtube.com/watch?v=i7twT3x5yv8\nSpecify Requirements\nDesign High Level Functional Components\n","title":"System Design Notes","type":"posts"},{"content":" Introduction # Previously, Serverless VPC Access connector is a commonly used solution to connect Cloud Run to Cloud SQL securely. This option is still available for use today but with all the previous blog posts that cover how we can:\nConnect from Cloud Run to VPC Connect instance from VPC to Cloud SQL It is only a simple manner where we can extend this and also say that we can connect our Cloud Run deployment to a cloud SQL without needing to setup a Serverless VPC Access Connector. However, there are a few pre-requisites that needs to be done in order for this to work.\nHave our Cloud SQL be joined to our VPC of choice. Cloud SQL is usually deployed in a separate network of sorts - so involves setting up a Private Service Access - the underlying implemnentation is one where the \u0026ldquo;external\u0026rdquo; instance (which is Cloud SQL here) would be provided an internal IP address from our VPC - of course this doesn\u0026rsquo;t go into detail - you can check VPC Network Peering for more details on such detailed networking information. Have our Cloud Run also link up to the VPC as well Deploying the migration app # The same application that was used for the previous blog posts can be used here. Do take note that we are not running the migration job here - we need to rely on another mechanism to do the migration before our application can begin running (probably will be covered in another blog post of how this could be done in a sane way)\nLet\u0026rsquo;s say the database schema has already been set up; all we need to do is simply to run the migration. How shall we do this?\nFirst part is the same - build our docker image and have that available in Google Container Registry/Artifact Registry. We can do so with the following commands (tweak it if need to push to artifact registry - it uses different domains)\ndocker build -t gcr.io/xxx/basic-migrate:v1 . docker push gcr.io/xxx/basic-migrate:v1 Once the container is available on Cloud SQL, next step is to ensure that our Cloud SQL is already set up with our VPC - this will not be covered in this blog post. Do refer to previous blog posts (refer to the links for it at the top) on how this was done.\nNext step is to simply setup our Cloud Run and have it connect to our Cloud SQL instance. There are a few things to take note though:\nThe need to setup the various environment variables. There is no proper defaults if its not set. For more sensitive vars - we can probably also see if we can use Google Cloud\u0026rsquo;s Secret Manager but that is a story for another day. DATABASE_HOST DATABASE_NAME DATABASE_USER DATABASE_PASSWORD Another thing to take note is the default command being used to run the service. The basic-migrate app is a simple binary but it has a couple of subcommands. The docker image being build does not set the proper app server as the default command to be run - this has to be passed to Cloud Run to properly start it The application is actually exposed on port 8888 instead of usually 8080 Health checks need to be configured properly - a simple \u0026ldquo;TCP\u0026rdquo; check is insufficient - there is a reason why we have a /healthz endpoint - its for our healthcheck across various deployments like in a VM or in GKE as well. It applies to Google Cloud Run as well it seems. With that, we now have a Cloud Run to connects directly to a Cloud SQL database. It\u0026rsquo;s somewhat disappointing that we almost have an entirely serverless stack with this setup but it\u0026rsquo;s definitely better than having a Cloud SQL instance as well as a VM instance and having that setup - it would be priced way better. (This is for smaller projects only - larger projects actually benefit from having it deployed to a proper VM or even in GKE)\n","date":"11 October 2023","externalUrl":null,"permalink":"/access-cloud-sql-from-google-cloud-run-without-serverless-vpc-access-connectors-but-via-vpc/","section":"Posts","summary":"Introduction # Previously, Serverless VPC Access connector is a commonly used solution to connect Cloud Run to Cloud SQL securely. This option is still available for use today but with all the previous blog posts that cover how we can:\n","title":"Access Cloud SQL from Google Cloud Run without Serverless VPC Access Connectors but via VPC","type":"posts"},{"content":"I intend to try out the Turso service in order to see if there is any other potential serverless database that would have pretty decent type of billing for small projects. There isn\u0026rsquo;t a proper SQL based database that can be billed in a similar way to the Cloud Run product - it\u0026rsquo;ll be great if the billing of the database product would be along the amount of data being stored or amount of read/write requests done for the data instead of the usual charged based on how long the instance being run (based on how Cloud SQL is billed).\nTurso is a database that is somewhat based on SQLite - but not exactly SQLite. The usual SQLite libraries generally deal with files but in this case - we would need to form some sort of network connection to \u0026ldquo;turso\u0026rdquo; which is usually unlike the usual way of dealing with SQLite databases.\nI tried to do a quick integration via Golang to Turso using plain SQLite libraries that already existed but apparently, that doesn\u0026rsquo;t exactly work too well expected. A quick search as well lead to the following PR for adding support for sqld server like Turso: https://github.com/golang-migrate/migrate/pull/1000\nAlthough the integration to the Turso database can\u0026rsquo;t be done yet, it should still be possible to start preparing the sample application that I\u0026rsquo;ve been using all this while to accept multiple database integrations - the integration needs to handle for both database migration as well as running the application which would access the database. The sample application that I\u0026rsquo;ve been using (also mentioned in many of my previous blog posts) is available in this folder in the repo: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicMigrate\nFor now, I\u0026rsquo;m adding SQLite support as well as MySQL database support.\nSQLite and MySQL have slightly differing syntax-es, with that, we need to separate it into appropiate folders for this. https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicMigrate/migrations. Within this migrations folder, there is one for MySQL and one for SQLite. When we\u0026rsquo;re dealing with MySQL database, we would rely on the migrations within the MySQL folder. Likewise, for the SQLite database, we would rely on the migration scripts within the SQLite folder.\nAs of now, I haven\u0026rsquo;t thought too deep on how to abstract the logic for handling the different databases - right now, since there is only 2 databases supported here, it is handled via a simple if, else conditional statements. The various critical information is pasased to it via environment variables.\n... dbUser := os.Getenv(\u0026#34;DATABASE_USER\u0026#34;) dbPass := os.Getenv(\u0026#34;DATABASE_PASSWORD\u0026#34;) dbHost := os.Getenv(\u0026#34;DATABASE_HOST\u0026#34;) dbName := os.Getenv(\u0026#34;DATABASE_NAME\u0026#34;) useTLS := os.Getenv(\u0026#34;DATABASE_USE_TLS\u0026#34;) dbType := os.Getenv(\u0026#34;DATABASE_TYPE\u0026#34;) var d source.Driver var err error dsn := \u0026#34;\u0026#34; if dbType == \u0026#34;\u0026#34; || dbType == \u0026#34;mysql\u0026#34; { dsn = fmt.Sprintf(\u0026#34;mysql://%v:%v@(%v:3306)/%v\u0026#34;, dbUser, dbPass, dbHost, dbName) if strings.ToLower(useTLS) == \u0026#34;true\u0026#34; { fmt.Println(\u0026#34;database tls mode on\u0026#34;) dsn = dsn + \u0026#34;?tls=true\u0026#34; } d, err = iofs.New(fs, \u0026#34;migrations/mysql\u0026#34;) if err != nil { log.Fatal(err) } } else if dbType == \u0026#34;sqlite\u0026#34; { sqliteFile := os.Getenv(\u0026#34;SQLITE_FILE\u0026#34;) dsn = fmt.Sprintf(\u0026#34;sqlite3://%s?query\u0026#34;, sqliteFile) d, err = iofs.New(fs, \u0026#34;migrations/sqlite\u0026#34;) if err != nil { log.Fatal(err) } } else { fmt.Println(\u0026#34;unexpected dbType provided. Please check inputs\u0026#34;) os.Exit(1) } ... Do notice of how we\u0026rsquo;re referencing the right folder - if we\u0026rsquo;re on MySQL or MariaDB, we\u0026rsquo;re using migrations/mysql, if we\u0026rsquo;re on SQLite, we\u0026rsquo;re using migrations/sqlite\nIn order to make it convenient to test the integration of MySQL/MariaDB/SQlite to the application, there is a makefile to do so.\nall-mysql: start-mysql build sleep 30 make migrate-mysql make start-app-mysql all-sqlite: build create-sqlite migrate-sqlite start-app-sqlite start-mysql: docker run --name some-mysql -e MYSQL_DATABASE=application -e MYSQL_ROOT_PASSWORD=my-secret-pw -e MYSQL_USER=user -e MYSQL_PASSWORD=password -p 3306:3306 -d mysql:5.7 stop-mysql: docker stop some-mysql docker rm some-mysql build: go build -o lol . migrate-mysql: DATABASE_NAME=application DATABASE_USER=user DATABASE_PASSWORD=password \\ DATABASE_HOST=localhost DATABASE_TYPE=mysql \\ ./lol migrate start-app-mysql: DATABASE_NAME=application DATABASE_USER=user DATABASE_PASSWORD=password \\ DATABASE_HOST=localhost DATABASE_TYPE=mysql \\ ./lol server migrate-sqlite: DATABASE_NAME=application DATABASE_USER=user DATABASE_PASSWORD=password \\ DATABASE_HOST=localhost DATABASE_TYPE=sqlite SQLITE_FILE=application.db \\ ./lol migrate start-app-sqlite: DATABASE_NAME=application DATABASE_USER=user DATABASE_PASSWORD=password \\ DATABASE_HOST=localhost DATABASE_TYPE=sqlite SQLITE_FILE=application.db \\ ./lol server test-app: curl -X GET localhost:8888/health curl -X POST localhost:8888/user -d \u0026#39;{\u0026#34;first_name\u0026#34;:\u0026#34;zzz\u0026#34;,\u0026#34;last_name\u0026#34;:\u0026#34;zzz\u0026#34;}\u0026#39; curl -X GET localhost:8888/user/1 create-sqlite: sqlite3 application.db \u0026#34;.databases\u0026#34; For testing the application with MySQL/MariaDB - we can simply run the make all-mysql. Once the application is running, we can use the following command: make test-app.\nFor testing the application with SQLIte - we can simply run the make all-sqlite. Once the application is running, we can use the following command: make test-app.\n","date":"4 October 2023","externalUrl":null,"permalink":"/multiple-database-support-mysql-and-sqlite-support/","section":"Posts","summary":"I intend to try out the Turso service in order to see if there is any other potential serverless database that would have pretty decent type of billing for small projects. There isn’t a proper SQL based database that can be billed in a similar way to the Cloud Run product - it’ll be great if the billing of the database product would be along the amount of data being stored or amount of read/write requests done for the data instead of the usual charged based on how long the instance being run (based on how Cloud SQL is billed).\n","title":"Multiple Database Support - MySQL and SQLite support","type":"posts"},{"content":"Serverless computing, as seen in platforms like Cloud Run or AWS Lambda, allows developers to run code without managing the underlying infrastructure. This is achieved by automatically scaling the resources based on the incoming requests, and users are billed based on the actual execution time and resources consumed during each function or container invocation.\nWhen it comes to databases, managed database services exist, but they often involve a more traditional pricing model based on allocated resources, storage, and sometimes a provisioned throughput. These databases might offer automation for certain tasks like backups, updates, and scaling, but they do not strictly follow the same \u0026ldquo;pay only for what you use\u0026rdquo; model as serverless compute services.\nThe challenge with creating a fully serverless database that aligns with the pricing model of serverless compute services lies in the nature of databases. Databases often require persistent storage, continuous availability, and consistent performance, which makes it challenging to implement a pure pay-as-you-go model.\nOne of the services that fits the billing model the closest as compared to services such as Cloud Run would be the database provided by Planetscale. Planetscale provides mysql databases (or mysql-like databases) that is bills its users based on read and write requests done on the database as well as the amount of data stored on the database. We can compare this to the usual billing approaches by services such as Cloud SQL that charges its users based on the amount of time a Cloud SQL instance is kept running.\nThe usual billing approach might probably be cheaper at larger scales but for smaller projects that still require a MySQL database - it does seem quite unreasonable to pay for a small server instance that needs to be kept alive even though the amount of data requests is pretty small. The alternative approach here would be switch over to alternative databases that has that model - in Google Cloud, we would still have Cloud Datastore that bills user on data being stored - but its a completely different database - we would have to alter a huge chunk of code in order to be able to access and manipulate the data on it.\nCreating database on planet scale # It is pretty straightforward to create a database on planetscale - the product set is pretty small - most of the choices made available to the user pertains to the size of the database that is going to be requested. Naturally, with a small project - we would definitely request for a small database.\nDuring the provisioning step - we would be given a generated username and password which would then use to connect to the database. After the provisioning of the database, we would somehow end up in the following page that would show overall details of the database that we provision.\nEven if we missed out the important details of attempting to connect to the database, we can simply click on the \u0026ldquo;Connect\u0026rdquo; button and that would reveal the various methods to connect to the database - even via language or the various cli such as planetscale\u0026rsquo;s own cli tool.\nConnecting to database via sample application # Naturally, a good thing to do to check if an approach is viable is to have a sample application that would connect to the database. We can reuse the following application in order to attempt to connect our serverless application deployed on Cloud Run to connect to the newly provisioned database on Planetscale (that is also billed in a serverless-ish manner). https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicMigrate\nI\u0026rsquo;ll be skipping on possible approaches on how to run a database migration but instead, focus on how to get the application running and specific important things to take into account when trying to do so.\nPlanetscale only allows us to connect to it via tls connection it seems. Hence, our connection string has to contain the parameters useTLS=true in order for us to handle that. However, if we have that permanently, that will make local development a pretty painful process - we can\u0026rsquo;t be also attempting to set up TLS certs for our own local mysql instance for testing - that\u0026rsquo;ll be too much overhead just to test a certain functionality. This is why we would simply a condition that would add the TLS settings, assuming we pass that variable via environment variables. Health checks are done on the /health endpoint Database is external - so we don\u0026rsquo;t need to tinker around with sql proxies or even connect it to vpc - refer to previous posts on how to connect Cloud Run to Cloud SQL - that\u0026rsquo;ll save on administrative effort to do so. Do note that Cloud Run has issues to support endpoints that end with z - hence a recent change to remove that (check the history of the main.go file if you\u0026rsquo;re curious)\n","date":"27 September 2023","externalUrl":null,"permalink":"/serverless-applications-with-cloud-run-with-serverless-mysql-from-planetscale/","section":"Posts","summary":"Serverless computing, as seen in platforms like Cloud Run or AWS Lambda, allows developers to run code without managing the underlying infrastructure. This is achieved by automatically scaling the resources based on the incoming requests, and users are billed based on the actual execution time and resources consumed during each function or container invocation.\n","title":"Serverless Applications with Cloud Run with Serverless MySQL from PlanetScale","type":"posts"},{"content":" Introduction # Similar to my previous blog post, we would usually be connecting Google Kubernetes Engine (GKE) clusters to Cloud SQL databases by using the Cloud SQL Proxy. However, we can now use Private Service Connect, which allows for private communication between different Google Cloud services, similar to how we did for connecting our application in Google Compute Engine (VM) to a Cloud SQL instance.\nChecking for connectivity # Similar to how we can we did it for the previous post where we check if we can connect to the Cloud SQL instance from our Google Compute Engine instance - we can do the same for our application in Google Kubernetes Engine. However, this would first involve starting some small application which we can then install some stuff in order to install the tools that we need to test the connectivity to the Cloud SQL database.\nFirst, let\u0026rsquo;s create a nginx container and have it running in our cluster. I\u0026rsquo;ll assume that you would be familiar to connect to a Kubernetes cluster provisioned in Google Cloud.\nkubectl create deployment lol --image=nginx Once we have it up and running it, we can go into the image via the following commands:\nkubectl get pods kubectl exec -it \u0026lt;pod-name\u0026gt; -- /bin/bash Getting the pod name is done by choosing it from the kubectl get pods command. Next, we would install nmap tool. (Do note that we can\u0026rsquo;t ping our Cloud SQL instance)\napt update \u0026amp;\u0026amp; apt install -y nmap We can then run the nmap command against our private IP address provided after provisioning our Cloud SQL instance.\n$ nmap -Pn x.x.x.x Starting Nmap 7.93 ( https://nmap.org ) at 2024-01-21 12:03 UTC Nmap scan report for x.x.x.x Host is up (0.0016s latency). Not shown: 999 filtered tcp ports (no-response) PORT STATE SERVICE 3306/tcp open mysql Nmap done: 1 IP address (1 host up) scanned in 4.28 seconds From the above, it seems that we can connect to the database just fine from a pod within the cluster.\nDeploy a helm chart # The next step would be to deploy a helm chart that would make use of our database. Refer to the following application built here (similar to the previous blog post). Refer to the following url: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicMigrate\nWe can run the following commands to build the docker image:\ndocker build -t gcr.io/xxx/basic-migrate:v1 . docker push gcr.io/xxx/basic-migrate:v1 We would need to create the following custom yaml which we can use to feed to our helm chart. This values would tweak values for our helm chart that would be installed to our cluster. Do take note of the important value to be changed which is the appConfig.databaseHost value.\nimage: repository: gcr.io/xxx/basic-migrate tag: \u0026#34;v1\u0026#34; resources: limits: cpu: 500m memory: 500Mi requests: cpu: 250m memory: 256Mi appConfig: databaseHost: x.x.x.x We will run the helm chart installation with the above yaml configuration.\nhelm install -f app-values.yaml basic ./basicMigrate After waiting a while, we can should see it successfully installed on our cluster.\n$ kubectl get pods NAME READY STATUS RESTARTS AGE basic-basic-migrate-c478fd699-mbctt 1/1 Running 0 13s basic-basic-migrate-migrate-49hw8 0/1 Completed 0 19s lol-69f74bb-x5pkj 1/1 Running 0 19m Checking that it actually works # We would still want to double check that the whole setup above works. We can do so by still making use of our lol deployment that uses the nginx docker image. First we would install the mariadb-client deb package.\napt install -y mariadb-client Next, we can run the following command:\nmysql -h x.x.x.x -u root -p It will prompt you for a password. Once the right password is passed, we would be able to start manipulate the database with root credentials. Next, we would run the following SQL commands.\n\u0026gt; use application; \u0026gt; show tables; +-----------------------+ | Tables_in_application | +-----------------------+ | schema_migrations | | users | +-----------------------+ 2 rows in set (0.002 sec) The important part here is that the users table exists - that would implicitly indicate the migration is run successfully - naturally, we can do further tests - but this should be sufficient for now.\n","date":"20 September 2023","externalUrl":null,"permalink":"/access-cloud-sql-from-google-kubernetes-cluster-without-cloud-sql-proxy/","section":"Posts","summary":"Introduction # Similar to my previous blog post, we would usually be connecting Google Kubernetes Engine (GKE) clusters to Cloud SQL databases by using the Cloud SQL Proxy. However, we can now use Private Service Connect, which allows for private communication between different Google Cloud services, similar to how we did for connecting our application in Google Compute Engine (VM) to a Cloud SQL instance.\n","title":"Access Cloud SQL from Google Kubernetes Cluster without Cloud SQL Proxy","type":"posts"},{"content":"Traditionally, when connecting a Google Compute Engine instance to a Cloud SQL database, the Cloud SQL Proxy was commonly used to facilitate secure connections. The Cloud SQL Proxy acted as an intermediary between the application running on a Compute Engine instance and the Cloud SQL database. It helped to secure the connection by using the Cloud SQL IAM database authentication and provided a way to connect to the database using a Unix socket.\nHowever, Google introduced Private Service Connect to this mechanism to allow Google Compute VMs to connect to Databases through private IPs. With Private Service Connect, you can create a private connection between your Google Compute Engine instances and your Cloud SQL database without needing the Cloud SQL Proxy. Private Service Connect enables secure and direct communication between the instances and the Cloud SQL database using private IP addresses.\nThe new approach immediately avoids introducing another hop in our network hop to send data from our application in a Google Compute Engine (VM) to the Cloud SQL database. The rest of the post kind of mentions how we can do so (almost similar in experience as though it\u0026rsquo;s just another \u0026ldquo;VM\u0026rdquo; on our internal network)\nSetting up a Cloud SQL database # We can set up a Cloud SQL database to test this feature out. I have a sample application that would interact with a MySQL database. The important bit when setting up the Cloud SQL instance would be to set the proper exposure of the instance in a private manner.\nAfter creating the instance, it should provide a private IP address where our instance should be able to access it.\nUsing mariadb-client to connect to our database # Apparently, the ping command is unable to accesss or check the the ip address of the sql instance.\nping x.x.x.x PING x.x.x.x (x.x.x.x) 56(84) bytes of data. ^C --- x.x.x.x ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2031ms We would need to use nmap command instead.\n$ sudo apt update \u0026amp;\u0026amp; sudo apt install -y nmap $ nmap x.x.x.x Starting Nmap 7.80 ( https://nmap.org ) at 2024-01-21 03:21 UTC Note: Host seems down. If it is really up, but blocking our ping probes, try -Pn Nmap done: 1 IP address (0 hosts up) scanned in 3.03 seconds ~$ nmap -Pn x.x.x.x Starting Nmap 7.80 ( https://nmap.org ) at 2024-01-21 03:23 UTC Nmap scan report for x.x.x.x Host is up (0.0034s latency). Not shown: 999 filtered ports PORT STATE SERVICE 3306/tcp open mysql Nmap done: 1 IP address (1 host up) scanned in 6.54 seconds Now, we know that we are able to access the mysql instance from our Google Compute Engine, we can then proceed to install mariadb-client in order to manipulate our created database.\n$ sudo apt update \u0026amp;\u0026amp; sudo apt install -y mariadb-client $ $ mysql -h x.x.x.x -u root -p Enter password: Welcome to the MariaDB monitor. Commands end with ; or \\g. Your MySQL connection id is 527 Server version: 8.0.31-google (Google) Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type \u0026#39;help;\u0026#39; or \u0026#39;\\h\u0026#39; for help. Type \u0026#39;\\c\u0026#39; to clear the current input statement. MySQL [(none)]\u0026gt; We can then run some quick commands to list our all databases on our instance\n\u0026gt; Show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | sys | +--------------------+ 4 rows in set (0.005 sec) Let\u0026rsquo;s add a new database from the UI on Google Cloud Console\n\u0026gt; Show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | sample | | sys | +--------------------+ 5 rows in set (0.003 sec) Using the database with our application # Surprisingly, our newly created user seem to have ability to manage the new sample database. That would be somewhat convenient as we don\u0026rsquo;t have to worry if our application won\u0026rsquo;t have the required permissions. The normal thing to do when we create users on a newly created sql database would be to grant it priviliges - but we don\u0026rsquo;t need to do it here.\n\u0026gt; use mysql; \u0026gt; select user from user; \u0026gt; show grants for sample; +-----------------------------------------------+ | Grants for sample@% | +-----------------------------------------------+ | GRANT USAGE ON *.* TO `sample`@`%` | | GRANT `cloudsqlsuperuser`@`%` TO `sample`@`%` | +-----------------------------------------------+ 2 rows in set (0.008 sec) The following code from this folder in my repo would serve as a quick way to test this: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicMigrate\nFirst, we would need to compile and push the binary over to the server. At the same time, we would also need to create a new database called application. The migration scripts mentioned in our application only seem to connect to the database name application.\n$ GOOS=linux CGO_ENABLED=0 go build -o app . $ scp app hairizuan@146.148.92.172:app $ DATABASE_USER=sample DATABASE_PASSWORD=sample DATABASE_HOST=10.92.64.6 DATABASE_NAME=application ./app migrate Do note that the above practise is naturally a bad way to run our application; it would be way better to chuck the running of the application via tooling or scripts to ensure that the commands won\u0026rsquo;t be stored in some sort of history.\nA silent response from the running of the application should hopefully mean that the application is run successfully and the database migration has completed with no issues. We can double check on this by going into our database and checking it:\n\u0026gt; use application; \u0026gt; show tables; +-----------------------+ | Tables_in_application | +-----------------------+ | schema_migrations | | users | +-----------------------+ 2 rows in set (0.004 sec) Closing thoughts # It\u0026rsquo;s finally nice to be no longer need to install another separate binary just so that we can utilize Cloud SQL. It does feel kind of counter-intuitive that such tooling is needed in the past but now, this is somewhat more aligned to how we would usually connect databases - simply point our binary to a host and its corresponding port.\n","date":"13 September 2023","externalUrl":null,"permalink":"/access-cloud-sql-from-google-compute-engine-without-cloud-sql-proxy/","section":"Posts","summary":"Traditionally, when connecting a Google Compute Engine instance to a Cloud SQL database, the Cloud SQL Proxy was commonly used to facilitate secure connections. The Cloud SQL Proxy acted as an intermediary between the application running on a Compute Engine instance and the Cloud SQL database. It helped to secure the connection by using the Cloud SQL IAM database authentication and provided a way to connect to the database using a Unix socket.\n","title":"Access Cloud SQL from Google Compute Engine without Cloud SQL Proxy","type":"posts"},{"content":"Google Cloud Run is a serverless compute platform that automatically scales applications in response to traffic. It is designed to run stateless containers, meaning that the instances of your application are ephemeral and can be spun up or down as needed. This design choice has implications for data storage, particularly when it comes to persistence.\nOne notable limitation of Google Cloud Run is that it doesn\u0026rsquo;t have built-in persistent storage. Each instance of a Cloud Run service operates independently and is stateless. When an instance is scaled down to zero or replaced by a new one, any data stored locally on that instance is lost.\nHowever, not all applications can conform to the nature of being completely stateless where data doesn\u0026rsquo;t need to be stored. Some still require the data to be stored somewhere on disk, which is where conveniently enough - we can setup some of \u0026ldquo;fake\u0026rdquo; filesystem which some of these application can take advantage of. Refer to the following details: https://cloud.google.com/run/docs/tutorials/network-filesystems-fuse\nBuilding the application and include fuse in it # We can have a simple python app that would create on \u0026ldquo;disk\u0026rdquo;. Here is one example of such an app\nfrom flask import Flask from datetime import datetime import os app = Flask(__name__) @app.route(\u0026#34;/create\u0026#34;) def access(): value = datetime.now() if os.getenv(\u0026#34;FOLDER\u0026#34;) is not None: folder = os.getenv(\u0026#34;FOLDER\u0026#34;) else: folder = \u0026#34;/app/\u0026#34; file_location = folder + value.strftime(\u0026#34;%Y-%m-%d-%H-%M-%S\u0026#34;) try: with open(file=file_location, mode=\u0026#39;w\u0026#39;) as file: file.write(str(value)) return \u0026#34;the following file is created: {}\u0026#34;.format(file_location) except Exception as e: print(e) return \u0026#34;unable to create file. check logs for error\u0026#34; @app.route(\u0026#34;/\u0026#34;) def hello_world(): return \u0026#34;\u0026lt;p\u0026gt;Hello, World!\u0026lt;/p\u0026gt;\u0026#34; On calling the /create endpoint, it would simply create some files with dates printed in it.\nFor the requirements.txt, we would only need the flask module.\nflask Here is the dockerfile for our application. Note that we\u0026rsquo;re simply following the guide from the above url - which is to use the tini utility tool.\nFROM python:3.11.7-slim-bookworm RUN apt-get update \u0026amp;\u0026amp; apt-get install -y curl gnupg lsb-release tini \u0026amp;\u0026amp; \\ gcsFuseRepo=gcsfuse-`lsb_release -c -s` \u0026amp;\u0026amp; \\ echo \u0026#34;deb https://packages.cloud.google.com/apt $gcsFuseRepo main\u0026#34; | \\ tee /etc/apt/sources.list.d/gcsfuse.list \u0026amp;\u0026amp; \\ curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | \\ apt-key add - \u0026amp;\u0026amp; \\ apt-get update \u0026amp;\u0026amp; \\ apt-get install -y gcsfuse \u0026amp;\u0026amp; \\ apt-get clean WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY hello.py . COPY gcsfuse_run.sh . # Set fallback mount directory ENV MNT_DIR /mnt/gcs ENV BUCKET hairizuan-cloud-run-gcsfuse # Ensure the script is executable RUN chmod +x /app/gcsfuse_run.sh # Use tini to manage zombie processes and signal forwarding # https://github.com/krallin/tini ENTRYPOINT [\u0026#34;/usr/bin/tini\u0026#34;, \u0026#34;--\u0026#34;] # Pass the startup script as arguments to Tini CMD [\u0026#34;/app/gcsfuse_run.sh\u0026#34;] For the running of the application, we would need to do two things:\nStart the gcsfuse binary to mount the bucket to a particular folder Start our python application Bonus points: Any SIGTERM signal sent to the shell script is actually propagated to all relevant processes Hence, here is the shell script for it:\n#!/usr/bin/env bash set -eo pipefail # Create mount directory for service mkdir -p $MNT_DIR echo \u0026#34;Mounting GCS Fuse.\u0026#34; gcsfuse --debug_gcs --debug_fuse $BUCKET $MNT_DIR echo \u0026#34;Mounting completed.\u0026#34; flask --app hello run --host 0.0.0.0 For getting the above docker image to run on Cloud Run, we would simply do the usual of building and pushing the docker image to container/artifact registry.\ndocker build -t gcr.io/xxx/flask-tester:0.0.1 . docker push gcr.io/xxx/flask-tester:0.0.1 Manual steps to take note # For running of our cloud run service, we would do a few things:\nCreate a service account (ideally) Ensure that the service account has access to \u0026ldquo;Storage Object Admin\u0026rdquo; to allow the service account to be able to list and create and manipulate objects on the bucket. It also probably need the \u0026ldquo;Cloud Run Invoker\u0026rdquo; to ensure that it is able to start the Cloud Run service accordingly. Ensure that FOLDER environment is set. In the case that we aren\u0026rsquo;t altering the default MNT_DIR, we can simply have FOLDER be /mnt/gcs/. For the above python script (would be great thing to fix for the future), we would need to add the last slash behind gcs since we aren\u0026rsquo;t properly creating file paths that we can use. ","date":"6 September 2023","externalUrl":null,"permalink":"/persistance-in-google-cloud-run-with-fuse-storage-to-google-cloud-storage/","section":"Posts","summary":"Google Cloud Run is a serverless compute platform that automatically scales applications in response to traffic. It is designed to run stateless containers, meaning that the instances of your application are ephemeral and can be spun up or down as needed. This design choice has implications for data storage, particularly when it comes to persistence.\n","title":"Persistance in Google Cloud Run with FUSE storage to Google Cloud Storage","type":"posts"},{"content":"","date":"6 September 2023","externalUrl":null,"permalink":"/categories/python/","section":"Article Categories","summary":"","title":"Python","type":"categories"},{"content":"","date":"6 September 2023","externalUrl":null,"permalink":"/tags/python/","section":"Technology Tags","summary":"","title":"Python","type":"tags"},{"content":"The typical way to access Google compute instances from Cloud Run is usually done via the Serverless VPC Access. However, setting this up would mean that we are essentially create an instance that would be used as a proxy to send traffic from Cloud Run to the Google Compute instance.\nThings have changed quite a bit now. We no longer need this; we can now connect directly to Google Compute Instance without Serverless VPC Access. This would be the best page reference for this: https://cloud.google.com/run/docs/configuring/vpc-direct-vpc\nThis would be the flask app that we would use to test this functionality\nfrom flask import Flask import requests import logging app = Flask(__name__) @app.route(\u0026#34;/access-instance\u0026#34;) def access(): try: # TODO: Allow this to be configurable externally resp = requests.get(\u0026#34;http://10.128.0.30\u0026#34;) logging.info(resp.status_code) return \u0026#34;\u0026lt;p\u0026gt;\u0026#34; + resp.text + \u0026#34;\u0026lt;/p\u0026gt;\u0026#34; except Exception as e: logging.error(e) return \u0026#34;\u0026lt;p\u0026gt;Failed to access isntance 1\u0026lt;/p\u0026gt;\u0026#34; @app.route(\u0026#34;/\u0026#34;) def hello_world(): return \u0026#34;\u0026lt;p\u0026gt;Hello, World!\u0026lt;/p\u0026gt;\u0026#34; The root endpoint is simply to allow us to check that the application is working as expected. For /access-instance endpoint - it will be the endpoint that would reach out to Google Compute instance. It\u0026rsquo;ll be good to point out here that the ip address above should be something that you would need to configure - it\u0026rsquo;ll be the private IP address that is assigned to your created google compute instance.\nNaturally, for Google Cloud Run instances, we would need Docker images as well - which would naturally mean, we need Dockerfiles\nFROM python:3.11.7-slim-bookworm WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY hello.py . CMD [\u0026#34;flask\u0026#34;, \u0026#34;--app\u0026#34;, \u0026#34;hello\u0026#34;, \u0026#34;run\u0026#34;, \u0026#34;--host\u0026#34;, \u0026#34;0.0.0.0\u0026#34;] To run our flask application, we would need the flask and requests python library. This would be defined in requirements.txt file.\nflask requests To push our image to our registry, we can run the following commands:\ndocker build -t gcr.io/xxx/flask-test:0.0.1 . docker push gcr.io/xxx/flask-test:0.0.1 We can then go to Google Cloud Console to create our Google Cloud Run services as usual. Some of the properties that we need to set would be:\nWhich container that our Google Cloud Run instance would use? Maximum number of instances that our Cloud Run scale to Set that Google Cloud Run service to be accessible without authentication externally (for easier testing) However, the most important configuration to set is the networking section. We would simply need to set the flask tester\u0026rsquo;s networking section to be able to access a VPC - if our google compute instance is in the Default VPC.\nAfter the configuration work, we should something like this in the networking tab.\nThe next part is simply to do the following:\nCreate a google compute instance in the VPC that we set in our Google Cloud Run (we can technically create this before we create our Google Cloud Run VPC - they\u0026rsquo;re not totally dependent on each other). For convenience to go into the server to configure it, we can just set it to have public ip address On our google compute instance, install nginx - which is a convenient http server that would immediately provide an endpoint that we can connect to (without writing up some server code) Use the Google Cloud Run endpoint that\u0026rsquo;s automatically generated and access the /access-instance endpoint to check that we\u0026rsquo;re receiving the traffic properly and is able to get the result as expected. With that, we have a small demo to demonstrate and figure out how this feature works here.\nHowever, there are a few things to take note or figure out for future blog posts:\nThe following blog post talk about how a Cloud Run accesses a Google Compute Instance. Apparently, we\u0026rsquo;re accessing it via IP address - which is technically not ideal. In most cases, IP addresses for Google Compute instances are randomly assigned (unlikely we use static ip addresses). Might be good to figure out a way where we can try to access it by name instead. However, this might not be needed since if we\u0026rsquo;re talking about a service in virtual machines that can scale, we would need it behind a load balancer after all. Maybe there should be a blog that explores of how we can connect Google Cloud Run to an internal load balancer where a couple of virtual machines sit behind it. We only mention about how Google Cloud Run can connect to Google Compute instance but not the other way round. That is probably a topic for another time to talk about how we can connect from Google Compute instance to a Google Cloud Run service. ","date":"30 August 2023","externalUrl":null,"permalink":"/accessing-google-compute-instances-via-cloud-run/","section":"Posts","summary":"The typical way to access Google compute instances from Cloud Run is usually done via the Serverless VPC Access. However, setting this up would mean that we are essentially create an instance that would be used as a proxy to send traffic from Cloud Run to the Google Compute instance.\n","title":"Accessing Google Compute Instances via Cloud Run","type":"posts"},{"content":"We can apparently now store helm charts in Docker registries - this was made available via helm commands since v3.8.0. https://helm.sh/docs/topics/registries/\nNow with that being available for use, we can now use it across a variety of storage mechanism (as compared in the past when the artifacts produced through it has to be managed in some of file system and would require some of index file to list all available helm charts available).\nTo try things out locally, let\u0026rsquo;s try setting up a simple Docker registry on our host machine:\ndocker run -d -p 5000:5000 --restart always --name registry registry:2 We can then try to push a golang image into it\ndocker pull golang docker tag golang:latest localhost:5000/golang:latest docker push localhost:5000/golang:latest Building helm chart and pushing it in # First things first is to ensure that our helm version is valid and has the capability to do the following task of pushing helm charts into oci registries\nhelm version # Output: # version.BuildInfo{Version:\u0026#34;v3.10.1\u0026#34;, GitCommit:\u0026#34;9f88ccb6aee40b9a0535fcc7efea6055e1ef72c9\u0026#34;, GitTreeState:\u0026#34;clean\u0026#34;, GoVersion:\u0026#34;go1.18.7\u0026#34;} Next step is to build up a helm chart. Here is an example of one:\nhttps://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicHelm\nWithin this folder, run the following command to package it:\nhelm package ./basic-app/ This command would create the basic-app-0.1.0.tgz file which we can then push the registry\nhelm push basic-app-0.1.0.tgz oci://localhost:5000/basic-app Pushed: localhost:5000/basic-app/basic-app:0.1.0 Digest: sha256:6d3557ff6044f490d535e0cda7bbf979c7879a1380af6cf6a1dc9d8b532d5134 With that, we now have a proper place to put our helm artifacts - they can all be centralized in container based registries (since it supports the oci standard). We no longer need to consider putting it in alternative storage locations e.g. a simple filesystem or even blob storage.\nMore details of this is available on the following documentation page:\nhttps://helm.sh/docs/topics/registries/\n","date":"23 August 2023","externalUrl":null,"permalink":"/storing-helm-in-docker-registries/","section":"Posts","summary":"We can apparently now store helm charts in Docker registries - this was made available via helm commands since v3.8.0. https://helm.sh/docs/topics/registries/\nNow with that being available for use, we can now use it across a variety of storage mechanism (as compared in the past when the artifacts produced through it has to be managed in some of file system and would require some of index file to list all available helm charts available).\n","title":"Storing Helm in Docker Registries","type":"posts"},{"content":" Inspirations # While I was watching the following video of a talk by Richard Feldman: https://www.youtube.com/watch?v=zX-kazAtX0c\u0026ab_channel=ChariotSolutions. He was covering a pretty interesting concept/topic of how would one \u0026ldquo;slowly\u0026rdquo; migrate codebases from one language to another. Let\u0026rsquo;s say the codebase for an application is pretty large - how would we safely move it over and change it without increasing the deployment targets? Let\u0026rsquo;s say we\u0026rsquo;re not in microservices land and it is difficult for us to do the whole deployment for a whole other server just to begin the migration of languages.\nThere were a few ideas presented within the video:\nCreate a local running server that communicates over sockets with the main application Wasi/wasm binaries to communicate with the main application A translation layer between languages (in most languages, the common layer would be a c layer) - due to differences in memory management of different languages. We won\u0026rsquo;t be covering the main idea of that video but instead, focus on the wasi aspect ideas presented within the video. One of the reasons was because in Golang 1.21 release, there is now a wasip1 target available as a compilation target. I was curious to see if the support for this is sufficient to have something easily working which allows for this happen.\nImplementation # To get something working, we would first need to have some sample golang code that we would want to get exposed into the python script.\npackage main import \u0026#34;fmt\u0026#34; func sum(x, y int) int { return x + y } func main() { fmt.Println(\u0026#34;testing\u0026#34;) } For the above function, we would want to get the sum function into python - it should be callable from python with little to no issues. We can create wasm binary file with the following command to compile the binary:\nGOOS=wasip1 GOARCH=wasm go build -o lol.wasm main There isn\u0026rsquo;t too much information for how python can call Golang wasm binaries. However, there is a website called wasmer: https://wasmer.io/ that covers of how such wasm binaries can be called. It is available as a python library:\nfrom wasmer import engine, wasi, Store, Module, ImportObject, Instance from wasmer_compiler_cranelift import Compiler wasm_bytes = open(\u0026#39;lol.wasm\u0026#39;, \u0026#39;rb\u0026#39;).read() store = Store(engine.Universal(Compiler)) module = Module(store, wasm_bytes) wasi_version = wasi.get_version(module, strict=True) wasi_env = \\ wasi.StateBuilder(\u0026#39;wasi_test_program\u0026#39;). \\ argument(\u0026#39;--test\u0026#39;). \\ environment(\u0026#39;COLOR\u0026#39;, \u0026#39;true\u0026#39;). \\ environment(\u0026#39;APP_SHOULD_LOG\u0026#39;, \u0026#39;false\u0026#39;). \\ map_directory(\u0026#39;the_host_current_dir\u0026#39;, \u0026#39;.\u0026#39;). \\ finalize() import_object = wasi_env.generate_import_object(store, wasi_version) instance = Instance(module, import_object) yahoo = instance.exports.sum(12, 12) print(yahoo) This is the first error that appeared unfortunately. From initial checks on various stack overflow pages (e.g.https://github.com/wasmerio/wasmer-python/issues/657) - it could be an issue where wasmer isn\u0026rsquo;t fully supported on the macos environment? I haven\u0026rsquo;t gotten around to investigate this error further - it could also be some dependency that I didn\u0026rsquo;t install.\n% python yyy.py 3 Traceback (most recent call last): File \u0026#34;/XXXX/static-python/yyy.py\u0026#34;, line 1, in \u0026lt;module\u0026gt; from wasmer import engine, wasi, Store, Module, ImportObject, Instance File \u0026#34;/XXX/static-python-p5Sx-hLS/lib/python3.11/site-packages/wasmer/__init__.py\u0026#34;, line 1, in \u0026lt;module\u0026gt; raise ImportError(\u0026#34;Wasmer is not available on this system\u0026#34;) ImportError: Wasmer is not available on this system A quick fix to resolve this is to simply chuck it into a python docker container where it\u0026rsquo;ll run on a linux kernel (usually open source tooling have better support on linux environments). We can set that up by having the following Dockerfile:\nFROM python:3.9 WORKDIR /lol COPY . . RUN pip install -r /lol/requirements.txt The requirements.txt file here:\nwasmer wasmer_compiler_cranelift After which, we can simply run the following set of commands to build the docker container which we can then use to try to run the python script (that would contain the wasm/wasi binary.)\ndocker build -t lol . docker run -it lol /bin/bash We now face a new problem. I thought it could be issue where Golang only exports functions that start with capital letters so that is tried but I faced the same issue of missing issue.\nTraceback (most recent call last): File \u0026#34;/lol/yyy.py\u0026#34;, line 19, in \u0026lt;module\u0026gt; yahoo = instance.exports.Sum(12, 12) LookupError: Export `sum` does not exist. Apparently, the wasmer cli command is a pretty useful command when it comes to debugging the issues we\u0026rsquo;re facing here: https://github.com/golang/go/issues/58141\n% wasmer inspect lol.wasm Type: wasm Size: 2.0 MB Imports: Functions: \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;sched_yield\u0026#34;: [] -\u0026gt; [I32] \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;proc_exit\u0026#34;: [I32] -\u0026gt; [] \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;args_get\u0026#34;: [I32, I32] -\u0026gt; [I32] \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;args_sizes_get\u0026#34;: [I32, I32] -\u0026gt; [I32] \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;clock_time_get\u0026#34;: [I32, I64, I32] -\u0026gt; [I32] \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;environ_get\u0026#34;: [I32, I32] -\u0026gt; [I32] \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;environ_sizes_get\u0026#34;: [I32, I32] -\u0026gt; [I32] \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;fd_write\u0026#34;: [I32, I32, I32, I32] -\u0026gt; [I32] \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;random_get\u0026#34;: [I32, I32] -\u0026gt; [I32] \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;poll_oneoff\u0026#34;: [I32, I32, I32, I32] -\u0026gt; [I32] \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;fd_close\u0026#34;: [I32] -\u0026gt; [I32] \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;fd_write\u0026#34;: [I32, I32, I32, I32] -\u0026gt; [I32] \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;fd_fdstat_get\u0026#34;: [I32, I32] -\u0026gt; [I32] \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;fd_fdstat_set_flags\u0026#34;: [I32, I32] -\u0026gt; [I32] \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;fd_prestat_get\u0026#34;: [I32, I32] -\u0026gt; [I32] \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;fd_prestat_dir_name\u0026#34;: [I32, I32, I32] -\u0026gt; [I32] Memories: Tables: Globals: Exports: Functions: \u0026#34;_start\u0026#34;: [] -\u0026gt; [] Memories: \u0026#34;memory\u0026#34;: not shared (271 pages..) Tables: Globals: It turns out that we need to \u0026ldquo;expose\u0026rdquo; functions out from our binaries and that\u0026rsquo;s not fully supported at the moment\u0026hellip;\nWASI Libraries (AKA Reactors)\nThe WASI concept of libraries allow compiled binaries to expose single functions for consumption from the host. This is not something that will be supported in the initial WASI port, as it requires a concept of marking Go functions as exported (i.e. //go:wasmexport), and somehow facilitating the execution of a single function. For more discussions on why this is complicated, see #42372.\nSeeing that we\u0026rsquo;re already at this stage, I was wondering if there was any way to get this example working without needing to wait for Golang\u0026rsquo;s team to release a the function exporing feature for wasi binaries.\nApparently, we can rely on Tinygo - they\u0026rsquo;ve been dealing with them for a long time even when the wasm/wasi project was in its infant stages.\nbrew tap tinygo-org/tools brew install tinygo With that, we can try to compile it but with a slight modification to our golang code\npackage main import \u0026#34;fmt\u0026#34; //export sum func sum(x, y int) int { return x + y } func main() { fmt.Println(\u0026#34;testing\u0026#34;) } We introduced the //export sum comment to inform the compiler to expose our sum function so that our python script can use it.\nWe can compile the above binary by running the following command:\ntinygo build -o lol.wasm -target wasm ./main.go With that, we have a built wasm/wasi binary file which we can then use in our python script. To ensure that the function is exported, we can try to inspect it. Notice within the exports field - we now have a sum function that somewhat resembles our function signature.\n% wasmer inspect lol.wasm Type: wasm Size: 410.7 KB Imports: Functions: \u0026#34;env\u0026#34;.\u0026#34;runtime.ticks\u0026#34;: [] -\u0026gt; [F64] \u0026#34;wasi_snapshot_preview1\u0026#34;.\u0026#34;fd_write\u0026#34;: [I32, I32, I32, I32] -\u0026gt; [I32] \u0026#34;env\u0026#34;.\u0026#34;syscall/js.valueGet\u0026#34;: [I32, I32, I32, I32, I32] -\u0026gt; [] \u0026#34;env\u0026#34;.\u0026#34;syscall/js.valuePrepareString\u0026#34;: [I32, I32, I32] -\u0026gt; [] \u0026#34;env\u0026#34;.\u0026#34;syscall/js.valueLoadString\u0026#34;: [I32, I32, I32, I32, I32] -\u0026gt; [] \u0026#34;env\u0026#34;.\u0026#34;syscall/js.finalizeRef\u0026#34;: [I32, I32] -\u0026gt; [] \u0026#34;env\u0026#34;.\u0026#34;syscall/js.stringVal\u0026#34;: [I32, I32, I32, I32] -\u0026gt; [] \u0026#34;env\u0026#34;.\u0026#34;syscall/js.valueSet\u0026#34;: [I32, I32, I32, I32, I32] -\u0026gt; [] \u0026#34;env\u0026#34;.\u0026#34;syscall/js.valueLength\u0026#34;: [I32, I32] -\u0026gt; [I32] \u0026#34;env\u0026#34;.\u0026#34;syscall/js.valueIndex\u0026#34;: [I32, I32, I32, I32] -\u0026gt; [] \u0026#34;env\u0026#34;.\u0026#34;syscall/js.valueCall\u0026#34;: [I32, I32, I32, I32, I32, I32, I32, I32] -\u0026gt; [] Memories: Tables: Globals: Exports: Functions: \u0026#34;malloc\u0026#34;: [I32] -\u0026gt; [I32] \u0026#34;free\u0026#34;: [I32] -\u0026gt; [] \u0026#34;calloc\u0026#34;: [I32, I32] -\u0026gt; [I32] \u0026#34;realloc\u0026#34;: [I32, I32] -\u0026gt; [I32] \u0026#34;_start\u0026#34;: [] -\u0026gt; [] \u0026#34;resume\u0026#34;: [] -\u0026gt; [] \u0026#34;go_scheduler\u0026#34;: [] -\u0026gt; [] \u0026#34;sum\u0026#34;: [I32, I32] -\u0026gt; [I32] \u0026#34;asyncify_start_unwind\u0026#34;: [I32] -\u0026gt; [] \u0026#34;asyncify_stop_unwind\u0026#34;: [] -\u0026gt; [] \u0026#34;asyncify_start_rewind\u0026#34;: [I32] -\u0026gt; [] \u0026#34;asyncify_stop_rewind\u0026#34;: [] -\u0026gt; [] \u0026#34;asyncify_get_state\u0026#34;: [] -\u0026gt; [I32] Memories: \u0026#34;memory\u0026#34;: not shared (2 pages..) Tables: Globals: Once we have everything setup, we can simply rebuild the docker container and then try to run the python script\n% docker run -it lol /bin/bash root@4227988ec17f:/lol# python yyy.py 24 Reflections # There are a few points that came up in my head as I go through the steps above:\nApparently the documentation for getting wasi/wasm working is quite fragmented and unclear. There is no one clear way of building out the wasi/wasm binaries and there is no clear and obvious way for the languages to consume such wasm/wasi binaries. The above step introduces quite a significant amount of complexity -\u0026gt; it somewhat almost convince me that it might be better to simply just do the strangle approach when moving applications between different programming languages (although it would cost quite a bit.) The above example is an extremely simple example and we didn\u0026rsquo;t use any/most of the useful Golang functionality yet. Since we\u0026rsquo;re using tinygo, we need to realize that there is possibility that not all functionality is ported over - some things may not work as expected, we will probably need to experiment further to see what the differences are. The devils are always in the details; who would have known that we would need to have some sort of step to mention of which function we would want to set as exported or not. References # Hopefully there will be the introduction of go:wasmexport https://github.com/golang/go/issues/42372 The issue for closing compiling GOOS=wasip1 GOARCH=wasm to create wasi binaries https://github.com/golang/go/issues/58141 Instructions for installing tinygo: https://tinygo.org/getting-started/install/macos/ Stack overflow article for \u0026ldquo;exported\u0026rdquo; functions issue. https://stackoverflow.com/questions/67978442/go-wasm-export-functions Example python script: https://github.com/wasmerio/wasmer-python/blob/master/examples/engine_universal.py https://github.com/wasmerio/wasmer-python/issues/657 https://github.com/wasmerio/wasmer-python/issues/712 ","date":"16 August 2023","externalUrl":null,"permalink":"/python-call-golang-functions-via-wasm/wasi/","section":"Posts","summary":"Inspirations # While I was watching the following video of a talk by Richard Feldman: https://www.youtube.com/watch?v=zX-kazAtX0c\u0026ab_channel=ChariotSolutions. He was covering a pretty interesting concept/topic of how would one “slowly” migrate codebases from one language to another. Let’s say the codebase for an application is pretty large - how would we safely move it over and change it without increasing the deployment targets? Let’s say we’re not in microservices land and it is difficult for us to do the whole deployment for a whole other server just to begin the migration of languages.\n","title":"Python call Golang functions via Wasm/Wasi","type":"posts"},{"content":"The following blog post is a continuation of the previous blog post on Writing code to store items in memory with Golang. The previous blog post was mostly to cover simpler cases where we storing something simple like data in a single array/slice. However, let\u0026rsquo;s say if we were to expand our use case to store in some sort of map instead (I know there is a concurrent hashmap version but let\u0026rsquo;s pretend it doesn\u0026rsquo;t exist here). How shall we build a store which uses hashmap to store key value pairs.\nOur memory store would need a way to do the following:\nStore key-value pairs Get value of key value pair Delete a key value pair All actual manipulation of the hashmap object would require us to control its access - there should be concurrent access as there might lead to data races leading to inconsistent and unexpected results. That would be mean that the store and delete operations would require a channel that would be handled by a single worker.\ntype storeItem struct { Key string Value string } type MemoryMapStore struct { items map[string]string addChan chan storeItem deleteChan chan string } We need to pass both key and value pairs to the channel - so we\u0026rsquo;ll have the channel take in a custom struct; which is in this case is our storeItem struct.\nWhile creating the MemoryMapStore - we should also start the single goroutine that would deal with adding and removing of key value pairs from the items hashmap. This can be done via the following piece of code:\nfunc NewMemoryMapStore() *MemoryMapStore { initMap := map[string]string{} aChan := make(chan storeItem) dChan := make(chan string) m := MemoryMapStore{ items: initMap, addChan: aChan, deleteChan: dChan, } go m.runner() return \u0026amp;m } func (m *MemoryMapStore) runner() { for { select { case x := \u0026lt;-m.addChan: m.items[x.Key] = x.Value case y := \u0026lt;-m.deleteChan: delete(m.items, y) } } } With the NewMemoryMapStore function, it would start the runner function that will deal and handle the incoming data into the channels.\nThe next step is to write up our Store, Get and Delete functions.\nfunc (m *MemoryMapStore) Store(key, value string) { m.addChan \u0026lt;- storeItem{key, value} } func (m *MemoryMapStore) Get(key string) (value string) { return m.items[key] } func (m *MemoryMapStore) Delete(key string) { m.deleteChan \u0026lt;- key } We can then simulate on whether the following data store works by running the following in the main function.\nfunc main() { a := NewMemoryMapStore() for i := 0; i \u0026lt; 10000; i++ { val := strconv.Itoa(i) go a.Store(val, val) } time.Sleep(5 * time.Second) fmt.Println(len(a.items)) for i := 0; i \u0026lt; 1000; i++ { val := strconv.Itoa(i) go a.Delete(val) } time.Sleep(5 * time.Second) fmt.Println(len(a.items)) } It should print 10000 and then subsequently, 9000. We can increase the number of goroutines and it should still mathematically compute (only thing to take note is the impact on CPU as it\u0026rsquo;s actually utilizing resources on your computer)\nThis simply an exercise to understand how we can utilzie channels to handle concurrency to store and handle data that was not originally built to deal with data in a concurrent fashion.\nThe full code would be this:\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;strconv\u0026#34; \u0026#34;time\u0026#34; ) func main() { a := NewMemoryMapStore() for i := 0; i \u0026lt; 10000; i++ { val := strconv.Itoa(i) go a.Store(val, val) } time.Sleep(5 * time.Second) fmt.Println(len(a.items)) for i := 0; i \u0026lt; 1000; i++ { val := strconv.Itoa(i) go a.Delete(val) } time.Sleep(5 * time.Second) fmt.Println(len(a.items)) } type storeItem struct { Key string Value string } type MemoryMapStore struct { items map[string]string addChan chan storeItem deleteChan chan string } func NewMemoryMapStore() *MemoryMapStore { initMap := map[string]string{} aChan := make(chan storeItem) dChan := make(chan string) m := MemoryMapStore{ items: initMap, addChan: aChan, deleteChan: dChan, } go m.runner() return \u0026amp;m } func (m *MemoryMapStore) runner() { for { select { case x := \u0026lt;-m.addChan: m.items[x.Key] = x.Value case y := \u0026lt;-m.deleteChan: delete(m.items, y) } } } func (m *MemoryMapStore) Store(key, value string) { m.addChan \u0026lt;- storeItem{key, value} } func (m *MemoryMapStore) Get(key string) (value string) { return m.items[key] } func (m *MemoryMapStore) Delete(key string) { m.deleteChan \u0026lt;- key } ","date":"9 August 2023","externalUrl":null,"permalink":"/writing-code-to-store-items-in-memory-with-golang-but-with-maps/","section":"Posts","summary":"The following blog post is a continuation of the previous blog post on Writing code to store items in memory with Golang. The previous blog post was mostly to cover simpler cases where we storing something simple like data in a single array/slice. However, let’s say if we were to expand our use case to store in some sort of map instead (I know there is a concurrent hashmap version but let’s pretend it doesn’t exist here). How shall we build a store which uses hashmap to store key value pairs.\n","title":"Writing code to store items in memory with Golang but with maps","type":"posts"},{"content":"I have a small tiny application that is a http api server that is meant to store data temporarily in memory. There is no need to persist data into any file or even database. The data that is to be stored doesn\u0026rsquo;t need to persist across restarts - hence, making it nonsensical to rely on files or databases.\nTechnically, I can rely on a tool like Redis but that would mean to rely on another component (for a small piece of data). Redis is somewhat overkill here for this tiny in application - Redis provides a lot of functionality but all I need a \u0026ldquo;something\u0026rdquo; where I can push/pull data from it. Also, there will only be 1 instance of the http api server - which means it doesn\u0026rsquo;t make too much sense to setup redis to allow multiple api servers to access the \u0026ldquo;central\u0026rdquo; memory store.\nIf one is to build such a memory store naive in Golang - it will cause some slight issues. Let\u0026rsquo;s see a potential naive implementation.\nThis would be the interface that would be used within the main.go\ntype Store interface { Store(x int) View() []int } Let\u0026rsquo;s say our memory store implemtation looks like this:\ntype MemoryStore struct { items []int } func NewMemoryStore() *MemoryStore { m := MemoryStore{ items: []int{}, } return \u0026amp;m } func (m *MemoryStore) Store(x int) { m.items = append(m.items, x) } func (m *MemoryStore) View() []int { dst := make([]int, len(m.items)) copy(dst, m.items) return dst } We can test the implementation by having this in our main.go file. Note that we would need to check and ensure that our memory store is able to handle concurrent store/view requests and is able to return responses in a consistent fashion. Each iteration/round of tests should always be giving the same response over and over again - if we put in 10 data points into it, it should return 10 data points. To ensure that it would be able to handle concurrent requests to store data - we can use the go keyword to start separate goroutines to push data in parallel into the Memory Store.\nfunc main() { z := store.NewMemoryStore() hoho(z) time.Sleep(15 * time.Second) fmt.Println(len(z.View())) } func Adder(a store.Store, x int) { for i := 0; i \u0026lt; 10; i++ { a.Store(x) time.Sleep(time.Duration(rand.Intn(100)) * time.Millisecond) } } func hoho(a store.Store) { for i := 0; i \u0026lt; 5; i++ { go Adder(a, i) } } On this initial naive implementation - there isn\u0026rsquo;t too much of issue on first glance. The above implemetation should return 50 items. Separate runs should always be returning 50 items. However, if we tried to increase the number of items and number of goroutines\u0026hellip;\nfunc Adder(a store.Store, x int) { for i := 0; i \u0026lt; 100; i++ { a.Store(x) time.Sleep(time.Duration(rand.Intn(100)) * time.Millisecond) } } func hoho(a store.Store) { for i := 0; i \u0026lt; 100; i++ { go Adder(a, i) } } We will start to see issues. The number of items that are stored at the end of the entire \u0026ldquo;store\u0026rdquo; step - the number of items that would be stored would very 9000-10000 items.\nThe issue is largely because within the Store function, it is mostly interacting with a variable that doesn\u0026rsquo;t exactly support access in parallel. There is a potential for data race happening here (which would occur at a higher frequency when we have a a large number of goroutines attempting to access it). In order to resolve it, we can try to look around and go for a \u0026ldquo;fan in\u0026rdquo; approach when attempting to have a variable that may have parallel requests/modifications happening at one time. This would involve having 1 single goroutine that would deal with modifications into the variable - no other goroutine should access/manipulate it. The goroutine could maybe pick up the variables that it would be put into it via channels.\nThe following modified version of the memory store is a better version of the above:\nfunc NewMemoryStore() *MemoryStore { m := MemoryStore{ items: []int{}, zz: make(chan int), } go m.start() return \u0026amp;m } func (m *MemoryStore) start() { for { select { case x := \u0026lt;-m.zz: m.items = append(m.items, x) } } } func (m *MemoryStore) Store(x int) { m.zz \u0026lt;- x } func (m *MemoryStore) View() []int { dst := make([]int, len(m.items)) copy(dst, m.items) return dst } The important bit would be the following subsection of the above code:\nfunc (m *MemoryStore) start() { for { select { case x := \u0026lt;-m.zz: m.items = append(m.items, x) } } } The following function is started on a separate golang routine that would be started within the New function. The code outside this module shouldn\u0026rsquo;t be able to access the data directly to \u0026ldquo;protect\u0026rdquo; it from external influence/parallel modification to it.\nNow, if we\u0026rsquo;re to increase the number of goroutines in the hoho function to 5000 goroutines - it should be a non-issue -\u0026gt; there is only 1 writer and that would ensure that the data is going it is consistent.\n","date":"2 August 2023","externalUrl":null,"permalink":"/writing-code-to-store-items-in-memory-with-golang/","section":"Posts","summary":"I have a small tiny application that is a http api server that is meant to store data temporarily in memory. There is no need to persist data into any file or even database. The data that is to be stored doesn’t need to persist across restarts - hence, making it nonsensical to rely on files or databases.\n","title":"Writing code to store items in memory with Golang","type":"posts"},{"content":" Motivation for finding emulator for Google Cloud Datastore # Many applications out there in the real world would require the use of databases to persist data. In the cases where an application depends on databases such as mysql or mariadb or postresql, we can create some form of \u0026ldquo;staging\u0026rdquo; server where we can test that the application works as expected. Additionally, we can even test to make sure that any database migration works as well without too much issues - we can import in some of the data from production and import it into the staging environment to make sure that it works.\nWith docker, this process is made so much easier. Us developers no longer need to think of how to setup the databases in our machines and \u0026ldquo;pollute\u0026rdquo; our machines with various installations of MySQL, MariaDB or any other databases that our applications use. We can simply just pull in the right version of databases, run it and simply test our code.\nThis whole setup works in the case where we rely on databases which is not exactly tied to a cloud vendor. However, what if we relied on something like Google Cloud Datastore? There isn\u0026rsquo;t exactly a docker image out there that focuses on having Google Cloud Datastore and exposing said interface for application to test against it. We would need to test the entire flow - including our integration of our codebase with the google-cloud libraries that are imported in our codebase. We can\u0026rsquo;t simply switch to a \u0026ldquo;fake\u0026rdquo; version as that wouldn\u0026rsquo;t test our end to end integration to the Google Cloud Datastore database.\nLuckily, the gcloud command has emulator tooling built it - so we can technically setup a Google Cloud Datastore that integrates well with the official Google Cloud Datastore libraries.\nRefer to the following link for the full source code: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicUsingDatastore\nThe example below is presented with Golang code.\nRunning Google Cloud Datastore Emulator # The important bit to running the Cloud Datastore emulator is to get the docker image running. We can do so by running the following command:\ndocker run -p 8081:8081 google/cloud-sdk:437.0.1 gcloud beta emulators datastore start --project=test --host-port=0.0.0.0:8081 Note that it is exposted on port 8081. We would also need to inform the emulator on how it is to be exposed/binded - which in our case, we need to ask it be binded to 0.0.0.0. This is done so that it can accept traffic from anywhere.\nThe next important bit would be to feed certain environment variables that will be using by the official Google Cloud Golang libraries - DATASTORE_PROJECT_ID and DATASTORE_EMULATOR_HOST. Emulator host will be the one that will officially tell the Golang library that calls datastore golang code to use and connect to emulator. If this is not done - it will always be trying to connect to the official Google Cloud Datastore product over the internet.\nDATASTORE_PROJECT_ID=test DATASTORE_EMULATOR_HOST=localhost:8081 go run main.go The rest of the code is somewhat similar to how would code when it comes to adding, deleting and listing of resources from a database (which in our case is the Google Cloud Datastore)\nFurther Thoughts # With this, we can replicate a mechanism that some people have been using to do automatic integration testing. We can test our code against an actual database - https://testcontainers.com/. With this library, we can programmatically create the cloud datastore and run it and then we can run the test to check our code is integrated properly against the cloud datastore. That would be \u0026ldquo;lighter\u0026rdquo; in nature as compared to testing it against an actual Cloud Datastore in Google Cloud Platform project. If we are to use the actual Google Cloud Datastore - we would need to think of cleaning up the database after our integration tests are doen - that would definitely add a huge amount of \u0026ldquo;pain\u0026rdquo; to our work.\n","date":"26 July 2023","externalUrl":null,"permalink":"/using-emulators-for-testing-google-cloud-datastore-integration/","section":"Posts","summary":"Motivation for finding emulator for Google Cloud Datastore # Many applications out there in the real world would require the use of databases to persist data. In the cases where an application depends on databases such as mysql or mariadb or postresql, we can create some form of “staging” server where we can test that the application works as expected. Additionally, we can even test to make sure that any database migration works as well without too much issues - we can import in some of the data from production and import it into the staging environment to make sure that it works.\n","title":"Using Emulators for testing Google Cloud Datastore integration","type":"posts"},{"content":"Part of the software engineer journey is to learn data structures - especially if one were to go for the software interviews. Surprising, data structures knowledge and familiarity with it becomes somewhat important in them - with knowledge with certain data strucutre, certain problems become somewhat easier (also, sometimes, all one can do is simply stare in wonder at the algorithms and data structures that people in the past created)\nOne relatively important data structure that kind of come up during my study of data strucutres is the heap data structure. It is often mentioned that one can utilize the heap data structures \u0026ldquo;max\u0026rdquo; values pretty easily.\nThe following page would show the heap data structure in better detail with diagrams etc:\nhttps://www.geeksforgeeks.org/heap-data-structure/\nIn most cases, the heap data structures - it is often represented or imagined as tress -\u0026gt; which is why, my initial test implementation of it is to somewhat build something close to this - like a tree:\ntype Node struct { value int left *Node right *Node } func Heapify(n *Node) *Node { if n.left == nil \u0026amp;\u0026amp; n.right == nil { return n } if n.left != nil { n.left = Heapify(n.left) } if n.right != nil { n.right = Heapify(n.right) } if n.left != nil { if n.value \u0026lt; n.left.value { tempLeft := n.left.left tempRight := n.left.right currentRight := n.right currentLeft := n.left currentLeft.right = currentRight currentLeft.left = n n.left = tempLeft n.right = tempRight n = currentLeft } } if n.right != nil { if n.value \u0026lt; n.right.value { tempLeft := n.right.left tempRight := n.right.right currentRight := n.right currentLeft := n.left currentRight.right = n currentRight.left = currentLeft n.left = tempLeft n.right = tempRight n = currentRight } } return n } func Printer(n *Node) { if n.left != nil { Printer(n.left) } fmt.Println(n.value) if n.right != nil { Printer(n.right) } } We have a node data struct, which we would use as nodes in a tree. We can then simply keep calling heapify to get the \u0026ldquo;max\u0026rdquo; value and bubble it to the top. Unfortunately, the node struct version is way harder to build out within a coding interview session - there is too much code to handle for this.\nWe can test the above node struct version of a heap by running the following function:\nfunc nodeImplementation() { leftz := Node{value: 3} leftLeftz := Node{value: 4} rightz := Node{value: 2, right: \u0026amp;leftLeftz} topz := Node{value: 1, left: \u0026amp;leftz, right: \u0026amp;rightz} Printer(\u0026amp;topz) aa := Heapify(\u0026amp;topz) fmt.Println(\u0026#34;after\u0026#34;) Printer(aa) fmt.Println(aa.value) } Unfortunately, it is a bit harder to implemnent other useful and important functionality for a heap such as adding values to a heap or removing values from a heap. The code for making that is harder than expected.\nInterestingly enough, there is a slice/array implementation of heaps. We would simply imagine the array laid out across the tree:\n0 / \\ 1 2 / \\ / \\ 3 4 5 6 The above tree representation would show the index numbers of where it would be if it were to be represented in a slice/array.\nThe below code would be the implementation for the heap data structure used as an array. Important bit would be the formulas:\nLeft side node: 2n + 1 Right side node: 2n + 2 Current node: n Parent node: (n-1)/2 The above formulas are to calculate the index-es of the other \u0026ldquo;nodes\u0026rdquo; on the array. Let\u0026rsquo;s demonstrate by giving an example:\nFor node 1, the left node is 3 and 4. By using the formula for left side node -\u0026gt; 2 x 1 + 1 = 3; it shows that the calculation is right. It would be the side for the right side node as well. For the parent of 1 (which is 0\u0026hellip;). We can use the calculation for it as well: (1-1)/2 = 0 -\u0026gt; which is also correct; the parent of \u0026ldquo;node 1\u0026rdquo; is 0.\nThe golang code for building a heap is the following:\nfunc ArrHeapify(nums []int, node int) { lhsIdx := 2*node + 1 rhsIdx := 2*node + 2 largestIdx := node if lhsIdx \u0026lt; len(nums) { if nums[lhsIdx] \u0026gt; nums[largestIdx] { largestIdx = lhsIdx } } if rhsIdx \u0026lt; len(nums) { if nums[rhsIdx] \u0026gt; nums[largestIdx] { largestIdx = rhsIdx } } if largestIdx != node { tempVal := nums[node] nums[node] = nums[largestIdx] nums[largestIdx] = tempVal ArrHeapify(nums, largestIdx) } } We can build the following driver code to test out our implementation:\nfunc main() { a := []int{1, 3, 5, 3, 6, 13, 10, 9, 8, 15, 17} fmt.Println(a) for i := (len(a) - 1) / 2; i \u0026gt;= 0; i-- { ArrHeapify(a, i) } fmt.Println(a) a = append(a, 90) for i := (len(a) - 1) / 2; i \u0026gt;= 0; i-- { ArrHeapify(a, i) } fmt.Println(a) } Naturally, the following piece of code is not perfect - but then again, for software interviews, it\u0026rsquo;ll be something that we eventually have to be build without even thinking too much about it. Maybe I\u0026rsquo;ll write a future blog post that will cover more details with it.\n","date":"19 July 2023","externalUrl":null,"permalink":"/heap-datastructure-with-slices/arrays-in-golang/","section":"Posts","summary":"Part of the software engineer journey is to learn data structures - especially if one were to go for the software interviews. Surprising, data structures knowledge and familiarity with it becomes somewhat important in them - with knowledge with certain data strucutre, certain problems become somewhat easier (also, sometimes, all one can do is simply stare in wonder at the algorithms and data structures that people in the past created)\n","title":"Heap datastructure with Slices/Arrays in Golang","type":"posts"},{"content":"In certain application scenarios - there is a need to have applications that need to do client side load balancing to a bunch of servers. Such cases are pretty rare - but we won\u0026rsquo;t be covering the exect reasons or scenarios or when these are needed. Instead, we will cover how we can do so with Golang applications in Kubernetes cluster.\nBuilding out the Golang application # We would need 2 types of applications to demonstrate this. One side of the application that will attempt to contact the servers that can scale up and down. This side will be \u0026ldquo;firer\u0026rdquo; application that will fire http requests - it will query the headless services (via DNS resolution lookup). The other application will simply be a simple http server and would just return a simple text data (with datetime) to show that the request is real and to differentiate the different requests on the server logs.\nWe can build 1 simple Golang application that can switch between the 2 different modes: \u0026ldquo;firer\u0026rdquo; vs \u0026ldquo;server\u0026rdquo; modes. The below code is the entire code base - it would still need to wrapped in a docker image etc before it can be deployed to the server.\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;io\u0026#34; \u0026#34;log\u0026#34; \u0026#34;net\u0026#34; \u0026#34;net/http\u0026#34; \u0026#34;os\u0026#34; \u0026#34;time\u0026#34; ) func firer() { hostName := os.Getenv(\u0026#34;SERVER_HOST\u0026#34;) if hostName == \u0026#34;\u0026#34; { fmt.Println(\u0026#34;hostname not defined. will exit\u0026#34;) os.Exit(1) } for { ips, err := net.LookupIP(hostName) if err != nil { fmt.Printf(\u0026#34;unexpected error while looking up ips: %v\u0026#34;, err) } for _, ip := range ips { fmt.Printf(\u0026#34;%v ips found. Will contact ip: %v\u0026#34;, len(ips), ip.String()) time.Sleep(2 * time.Second) resp, err := http.Get(fmt.Sprintf(\u0026#34;http://%v:8080\u0026#34;, ip.String())) if err != nil { fmt.Printf(\u0026#34;unexpected error when contacting: %v\\n\u0026#34;, err) } raw, _ := io.ReadAll(resp.Body) fmt.Printf(\u0026#34;Output from ip: %v, %v\u0026#34;, ip.String(), string(raw)) } } } func server() { port := 8080 http.HandleFunc(\u0026#34;/\u0026#34;, helloWorldHandler) log.Printf(\u0026#34;Server starting on port %v\\n\u0026#34;, port) log.Fatal(http.ListenAndServe(fmt.Sprintf(\u0026#34;:%v\u0026#34;, port), nil)) } func helloWorldHandler(w http.ResponseWriter, r *http.Request) { log.Println(\u0026#34;serving\u0026#34;, r.URL) fmt.Fprintf(w, \u0026#34;This is a test. Hello World Miaoza!! Time: %v\\n\u0026#34;, time.Now()) } func main() { mode := os.Getenv(\u0026#34;MODE\u0026#34;) if mode == \u0026#34;firer\u0026#34; { firer() } else if mode == \u0026#34;server\u0026#34; { server() } else { panic(\u0026#34;Mode not properly defined. Will terminate\u0026#34;) } } The most critical piece of the above code would be the following:\nips, err := net.LookupIP(hostName) if err != nil { fmt.Printf(\u0026#34;unexpected error while looking up ips: %v\u0026#34;, err) } This part would attempt to resolve our k8s service. Normally, for DNS Resolution - one hostname would usually resolve to 1 IP address - we would usually not bother doing a query and then managing that dns query within our codebase.\nWhen deploying the above application, we would need to deploy the following k8s service object - NOTE: there will be one very important line that would convert it from a \u0026ldquo;normal\u0026rdquo; kubernetes service to headless one.\napiVersion: v1 kind: Service metadata: labels: app: server component: server name: headless-server spec: ports: - name: http port: 8080 protocol: TCP targetPort: 8080 selector: app: server component: server type: ClusterIP clusterIP: None The most important line here would be the clusterIP: None. This would let kubernetes know not to provision a new IP for this kubernetes service but instead - simply expose all the IPs of the pods that are tagged to mentioned labels within.\nDeploying the headless server and firer # We can refer to the following codebase: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/headlessService\nIf we are to utilize the above codebase, we would first build the following docker image.\ndocker build -t gcr.io/\u0026lt;project id\u0026gt;/headless-service-app:v3 . We can then push the image into the container registry:\ndocker push gcr.io/\u0026lt;project id\u0026gt;/headless-service-app:v3 After which, we can then utilize the kustomize tool to then deploy the services once we have a GKE cluster.\nkustomize build . | kubectl apply -f - Initial deploy would only show 1 replica of the server. We can scale it out to 4 replicas.\nkubectl scale deployment server --replicas=4 Once we have 4 replicas, we can view the logs on our firer application.\n4 ips found. Will contact ip: 10.8.0.12Output from ip: 10.8.0.12, This is a test. Hello World Miaoza!! Time: 2023-10-22 13:13:43.428359366 +0000 UTC m=+3105.938009744 4 ips found. Will contact ip: 10.8.0.146Output from ip: 10.8.0.146, This is a test. Hello World Miaoza!! Time: 2023-10-22 13:13:45.431671786 +0000 UTC m=+3355.873443571 4 ips found. Will contact ip: 10.8.0.13Output from ip: 10.8.0.13, This is a test. Hello World Miaoza!! Time: 2023-10-22 13:13:47.435236353 +0000 UTC m=+3109.942062831 4 ips found. Will contact ip: 10.8.0.66Output from ip: 10.8.0.66, This is a test. Hello World Miaoza!! Time: 2023-10-22 13:13:49.449002043 +0000 UTC m=+2997.637140403 References # Refer to the following resources:\nFull code demo for this: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/headlessService Headless service: https://kubernetes.io/docs/concepts/services-networking/service/#headless-services ","date":"12 July 2023","externalUrl":null,"permalink":"/deploy-golang-apps-that-interact-with-headless-service-in-kubernetes/","section":"Posts","summary":"In certain application scenarios - there is a need to have applications that need to do client side load balancing to a bunch of servers. Such cases are pretty rare - but we won’t be covering the exect reasons or scenarios or when these are needed. Instead, we will cover how we can do so with Golang applications in Kubernetes cluster.\n","title":"Deploy Golang Apps that interact with headless service in Kubernetes","type":"posts"},{"content":"This is often a question that often comes up during system design interviews. If one were to design a system that requires the use of cache - one common question that comes up would be whether to use memcached or to use redis. On initial thought - both are kind of doing the same thing; both store stuff in memory which gives them pretty fast response times; however, both tools have entirely wildly different implementations and philosophies when it comes to the product - thereby - requiring developers to make tradeoffs when choosing between them.\nThe common things to ponder when it comes to that question of memcached vs redis would be this:\nMemcached is very simplistic; Redis is very feature reach, can store complex data models Memcached doesn\u0026rsquo;t even have cluster mode; Redis allows cluster mode to handle higher throughput. (Means for memcached - \u0026ldquo;cluster\u0026rdquo; mode would need to rely on clients - clients would need to implement all that logic) Memcached is multi-threaded while redis is \u0026ldquo;single threaded\u0026rdquo;. Means, if any operation is blocking, no requests can be served till it\u0026rsquo;s done. The following information is also in Devops Interview Questions.\nHowever, now let\u0026rsquo;s look from a more detailed angle - how will this differences reflect when it comes to using it for applications.\nUsing Golang to access Memcached # Weirdly enough, there isn\u0026rsquo;t an \u0026ldquo;official\u0026rdquo; Golang module out there for supporting calls to Memcached. However, this package kind of comes up quite a bit with a quick search: https://pkg.go.dev/github.com/bradfitz/gomemcache/memcache\nA very interesting thing to note would be the following line from the README.md.\nmc := memcache.New(\u0026#34;10.0.0.1:11211\u0026#34;, \u0026#34;10.0.0.2:11211\u0026#34;, \u0026#34;10.0.0.3:11212\u0026#34;) Apparently, this very line reflects the nature of how Memcached is really a simplistic tool and doesn\u0026rsquo;t have a \u0026ldquo;clustering\u0026rdquo; solution. Clustering is a pretty complex feature to implement and it would kind of make sense to not add that feature unnecessarily. Many people already find Memcached useful as it is - so \u0026ldquo;technically\u0026rdquo;, there isn\u0026rsquo;t a need to add such features.\nA simple usage of Golang with memcached can be done as follows:\nFirst, we would need to run a memcached docker image:\ndocker run --name my-memcache -p 11211:11211 -d memcached:1.6 memcached -m 64 We can then run the following golang code (of course we need to setup the go.mod and go.sum file)\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;time\u0026#34; \u0026#34;github.com/bradfitz/gomemcache/memcache\u0026#34; ) func main() { mc := memcache.New(\u0026#34;localhost:11211\u0026#34;) mc.Set(\u0026amp;memcache.Item{Key: \u0026#34;foo\u0026#34;, Value: []byte(\u0026#34;my value\u0026#34;)}) zz, err := mc.Get(\u0026#34;foo\u0026#34;) if err != nil { panic(fmt.Sprintf(\u0026#34;didnt expect error from gettting values from memcached %v\u0026#34;, err)) } fmt.Printf(\u0026#34;Value of foo: %v\\n\u0026#34;, string(zz.Value)) addErr := mc.Add(\u0026amp;memcache.Item{Key: \u0026#34;foo\u0026#34;, Value: []byte(\u0026#34;new value\u0026#34;)}) if addErr != nil { fmt.Printf(\u0026#34;Add error: %v\\n\u0026#34;, addErr) } appendErr := mc.Append(\u0026amp;memcache.Item{Key: \u0026#34;foo\u0026#34;, Value: []byte(\u0026#34;new value\u0026#34;)}) if addErr != nil { fmt.Printf(\u0026#34;Add error: %v\\n\u0026#34;, appendErr) } pp, _ := mc.Get(\u0026#34;foo\u0026#34;) fmt.Printf(\u0026#34;Value of foo: %v\\n\u0026#34;, string(pp.Value)) mc.Set(\u0026amp;memcache.Item{Key: \u0026#34;yar\u0026#34;, Value: []byte(\u0026#34;yar\u0026#34;), Expiration: 10}) time.Sleep(5 * time.Second) yy, err := mc.Get(\u0026#34;yar\u0026#34;) if err != nil { panic(fmt.Sprintf(\u0026#34;didnt expect error from gettting values from memcached %v\\n\u0026#34;, err)) } fmt.Printf(\u0026#34;Value of yar: %v\\n\u0026#34;, string(yy.Value)) time.Sleep(6 * time.Second) _, err = mc.Get(\u0026#34;yar\u0026#34;) if err != nil { fmt.Printf(\u0026#34;Expeccted error: %v\\n\u0026#34;, err) } } We aren\u0026rsquo;t testing the client side sharding of memcached keys - it is kind of hard to fully demonstrate and test that functionality via simple code. With regards to how the keys are sharded - it is done by hashing the key and then calculating one of the server ids to be used.\nUsing Golang to access Redis # When we start to look at the commands available when using Redis - we can clearly see how Redis is extremely feature-rich (and kind of overwhelming for first time users.). Redis comes with a lot of functionality and can be used to cover a pretty large variety of use cases. It even covers the case where redis keys can be used to write to a persistent store so that it can recover rather quickly in the case the server happens to \u0026ldquo;crash\u0026rdquo; in a disasterous fashion. (There doesn\u0026rsquo;t seem to be mention if Memcached has such features.)\nLet\u0026rsquo;s see a clear example via Golang code of something that is supported in Redis but not supported in Memcached:\nTo start a redis server via docker:\ndocker run --name some-redis -p 6379:6379 -d redis Then we can use the following code to drive and test out some redis functionality:\npackage main import ( \u0026#34;context\u0026#34; \u0026#34;fmt\u0026#34; \u0026#34;time\u0026#34; redis \u0026#34;github.com/redis/go-redis/v9\u0026#34; ) func main() { rdb := redis.NewClient(\u0026amp;redis.Options{ Addr: \u0026#34;localhost:6379\u0026#34;, Password: \u0026#34;\u0026#34;, // no password set DB: 0, // use default DB }) status := rdb.Set(context.TODO(), \u0026#34;foo\u0026#34;, \u0026#34;zzz\u0026#34;, 20*time.Second) if status.Err() != nil { panic(fmt.Sprintf(\u0026#34;error observed: %v\\n\u0026#34;, status.Err())) } fmt.Printf(\u0026#34;%+v\\n\u0026#34;, status) val := rdb.Get(context.TODO(), \u0026#34;foo\u0026#34;) fmt.Printf(\u0026#34;Value of foo: %v\\n\u0026#34;, val.Val()) fmt.Printf(\u0026#34;Value of foo: %v\\n\u0026#34;, val.String()) zz := rdb.HSet(context.TODO(), \u0026#34;zzz\u0026#34;, map[string]interface{}{\u0026#34;aa\u0026#34;: \u0026#34;qcaca\u0026#34;, \u0026#34;aqq\u0026#34;: 12}) if zz.Err() != nil { panic(fmt.Sprintf(\u0026#34;zz error observed: %v\\n\u0026#34;, zz.Err())) } yy := rdb.HGet(context.TODO(), \u0026#34;zzz\u0026#34;, \u0026#34;aqq\u0026#34;) fmt.Printf(\u0026#34;Value of zzz-aqq: %v\\n\u0026#34;, yy.Val()) } Note the following functions HSet and HGet. The following funcgtions allow us to add a hashmap into redis - afterwhich, we can pull specific values out of it -\u0026gt; kind of similar to a \u0026ldquo;hashmap\u0026rdquo; in a \u0026ldquo;hashmap\u0026rdquo; sort of situation. In order to do something similar in Memcached - we would first to serialize our data structure to some sort of byte format which we can then store into value of the key in Memcached. To get a specific value - we would still need to extract it out, deserialize it and then pull the specific value out.\nConclusion # Redis and Memcached are clearly 2 different products with completely different aims. Memcached remains to be a \u0026ldquo;sane\u0026rdquo; and simple choice while redis provides plenty of flexible options - the usage of which of the caching tool would be useful would all boil down the needs of the application to be built.\nProbably in the future, I will try to cover other Redis functions via Golang in more detail.\n","date":"5 July 2023","externalUrl":null,"permalink":"/redis-vs-memcached-via-golang/","section":"Posts","summary":"This is often a question that often comes up during system design interviews. If one were to design a system that requires the use of cache - one common question that comes up would be whether to use memcached or to use redis. On initial thought - both are kind of doing the same thing; both store stuff in memory which gives them pretty fast response times; however, both tools have entirely wildly different implementations and philosophies when it comes to the product - thereby - requiring developers to make tradeoffs when choosing between them.\n","title":"Redis vs Memcached via Golang","type":"posts"},{"content":"A common architectural pattern for relational databases is to create an additional replica server. This pattern usually come up due because most applications are usually read heavy - data is usually read to be presented to users.\nThe whole blog post would be to show how we can quickly get started (naturally - there could be better configuration that we can use here such as limiting which databases which are to be replicated to other replicas.)\nSetting up MariaDB on server # We are not utilizing the cloud database solutions provided by the cloud vendors - we won\u0026rsquo;t learn too much if we simply rely on that mechanism.\nFirst, we would need to create a normal linux/debian server. We would then need to install the mariadb server and its corresponding client.\nsudo apt update sudo apt install -y mariadb-server mariadb-client We can check that the database is installed the correctly by first going into the MySQL CLI tool.\nmysql Then, we can try to list the databases within it by running the following SQL command:\nSHOW DATABASES; It should respond with the following:\n+--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | +--------------------+ Testing the installed Database with an application # Now that we mariadb installed, we would need something to simulate the application which would be inserting the data into the databases. We can utilize the following application - the application would even run a migration step without requiring a separate sql script to do so. https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Apps/appwithmysql\nIn order to use the application mentioned in the github link, we would first need to create the database as well as the corresponding users.\nCREATE DATABASE testmysql; CREATE USER username IDENTIFIED BY \u0026#39;password\u0026#39;; GRANT ALL PRIVILEGES ON `testmysql`.* TO \u0026#39;username\u0026#39;; Once we have this in place, we can a scp of our binary from our application to the server. We should be able to run it with no issue. I\u0026rsquo;m assume the same binary name was used - which is recordmaker\nscp recordmaker \u0026lt;ssh user\u0026gt;@\u0026lt;ip address\u0026gt;:/home/\u0026lt;ssh user\u0026gt;/recordmaker After which, we can ssh into the server and start the recordmaker binary. If there are permission issues - might need to alter it with chmod etc.\n# In /home/\u0026lt;ssh user\u0026gt;/ ./recordmaker Alter server to be the primary database server # Now that we have an application to test the entire mechanism. First we need to setup primary server; we would need to also ensure that the primary is accessible by the other replicas. By default, MariaDB is setup to also be binded to 127.0.0.1 - it cannot be accessed from hosts from outside the server it resides in. We need to change this to 0.0.0.0. This is done by changing it in the following file: /etc/mysql/mariadb.conf.d/50-server.cnf\nIn the mysqld section\n[mysqld] ... #Other configuration # bind-address 127.0.0.1 - change this to 0.0.0.0 (similar as the next line) bind-address = 0.0.0.0 ... # Other configuration In the mariadb section\n[mariadb] log-bin log-basename=master1 We would then need tto restart the database to get these configurations to be used for the mariadb - configuration changes are usually not changed on the fly. We need to make sure that the database is properly binded to 0.0.0.0. We can run the following to check it: netstat -tlnp\n# netstat -tlnp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 4369/mariadbd The next step would be to once again to go into MySQL CLI and create the following user:\nCREATE USER \u0026#39;replication_user\u0026#39;@\u0026#39;%\u0026#39; IDENTIFIED BY \u0026#39;bigs3cret\u0026#39;; GRANT REPLICATION SLAVE ON *.* TO \u0026#39;replication_user\u0026#39;@\u0026#39;%\u0026#39;; Create the replica server # We would need to replicate the steps to setup MariaDB on the replica server. We would also need to reconfigure it the\nFor the mysqld section\n[mysqld] ... #Other configuration server-id = 2 ... # Other configuration For the mariadb section\n[mariadb] log-bin log-basename=slave1 We would also need to restart the database after this.\nCopy the data over from primary to replicas # This is the important bit here; we would need to \u0026ldquo;bootstrap\u0026rdquo; our replica server with the data from our primary server. I tried without it and replicating won\u0026rsquo;t even work (unless we bootstrap the primary + replica without even putting in the data into the server)\nThe instructions for this is available in the following section of the replication reference page on MariaDB documentation: https://mariadb.com/kb/en/setting-up-replication/#getting-the-masters-binary-log-co-ordinates. For this blog post, we would just list down the commands that would make this happen.\n# In the primary server # In MySQL tool FLUSH TABLES WITH READ LOCK; SHOW MASTER STATUS; #make sure we copy the values for log file and log pos. It is needed for later section # In bash of primary db mariadb-dump --all-databases We would then copy the data over to the replica server\nmariadb \u0026lt; backup-file.sql This would bootstrap our replica database with the required data.\nThen, on the primary server, we can simply run the following command:\nUNLOCK TABLES; Final configuration of replicas MariaDB server # We would need to run the following command on our replica server.\nCHANGE MASTER TO MASTER_HOST=\u0026#39;instance-1\u0026#39;, MASTER_USER=\u0026#39;replication_user\u0026#39;, MASTER_PASSWORD=\u0026#39;bigs3cret\u0026#39;, MASTER_PORT=3306, MASTER_LOG_FILE=\u0026#39;master1-bin.000001\u0026#39;, MASTER_LOG_POS=330, MASTER_CONNECT_RETRY=10; We can then start the slave thread\nSTART SLAVE; We can then check if slave is running and replication works as expected.\nSHOW SLAVE STATUS \\G If there are any issues, check out the following forum page:\nhttps://stackoverflow.com/questions/1724191/mysql-slave-i-o-thread-not-running\nOne more round of testing # To make sure that the entire replication process is working, we can utilize our friend, recordmaker that would create database records. As we start running it, we can go to replica server and keep running the following SQL query:\nselect * from `testmysql`.`users` order by `updated_at` desc limit 10 ; We will see that there is some slight replication delay but it should be roughly ok for application use. It could be 20-30s delay at times but it could be the amount of data being generated by the recordmaker tool.\nFinal thoughts # Setting up the above is such a pain - automation is definitely needed. The entire process is pretty much error prone - just one misstep would easily mean bad replication leading to database corruption making it impossible to use.\nReferences # Following blog post is heavily references from the following page:\nhttps://mariadb.com/kb/en/setting-up-replication/ Need to \u0026ldquo;expose\u0026rdquo; mysql to other https://mariadb.com/kb/en/configuring-mariadb-for-remote-client-access/ https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Apps/appwithmysql https://stackoverflow.com/questions/21664091/mariadb-not-allowing-remote-connections https://dba.stackexchange.com/questions/51076/copy-all-data-to-slave-before-mysql-replication-connect ","date":"28 June 2023","externalUrl":null,"permalink":"/trying-to-create-mariadb-replica-server/","section":"Posts","summary":"A common architectural pattern for relational databases is to create an additional replica server. This pattern usually come up due because most applications are usually read heavy - data is usually read to be presented to users.\n","title":"Trying to create MariaDB replica server","type":"posts"},{"content":"After coding in both Python and Golang, I now have a very strong preference for strongly typed languages. There is a certain charm and beauty in being able to have the IDE that I\u0026rsquo;m working in able to provide good autocomplete suggestions for the code - there is less for a need to keep moving files in the codebases just to ensure that the function spelling and params are correct etc. For smaller programs, dynamic types languages are still ok but they get very unwieldy once they go pass the hundreds of lines of code mark.\nIn a previous post, Writing static python with mypy, I finally started playing around with using static python via the usage of mypy library and the utilities that surround it. That post provided some simple examples that would potentially cover some of the more common use cases.\nHowever, even with all that, there is one thing that I really like in Golang language that I was checking around in static Python - the interface (well in Golang\u0026rsquo;s terms). With that in place, that would allow us to substitute in different implementations of code in without tying us down to one specific implementation. As one always say, always expect change - code can be working fine but it could introduce breaking changes the next day or even become deprecated.\nReference for the below Golang code: https://github.com/hairizuanbinnoorazman/slides-to-video\nLet\u0026rsquo;s look at some golang code first:\npackage user ... type Store interface { Create(ctx context.Context, u User) error GetUser(ctx context.Context, ID string) (User, error) GetUserByEmail(ctx context.Context, Email string) (User, error) GetUserByActivationToken(ctx context.Context, ActivationToken string) (User, error) GetUserByForgetPasswordToken(ctx context.Context, ForgetPasswordToken string) (User, error) Update(ctx context.Context, ID string, setters ...func(*User) error) (User, error) } ... Let\u0026rsquo;s say we have some sort of storage component for user entities in an application. As long as our types conform and have the above said functions, it should be for such an implementation to accept to other parts of the codebases. For the above user store, we can use the Store interface from the user package. For the reference Golang code, there are 2 types of storage for the User package, one is Datastore backend and other is MySQL backend.\ntype Authenticate struct { Logger logger.Logger TableName string ClientID string ClientSecret string RedirectURI string Auth services.Auth UserStore user.Store } What would be a similar-ish implementation for the python codebase?\nThe \u0026ldquo;interface\u0026rdquo; can be replicated by using the Protocol keyword and stuffing it into a class. The ... denotes an empty function - we shouldn\u0026rsquo;t need to define functions for an \u0026ldquo;interface\u0026rdquo;.\nclass UserStore(Protocol): def create_user(self, u: User) -\u0026gt; None: ... def update_user(self, u: User) -\u0026gt; None: ... def delete_user(self, id: str) -\u0026gt; None: ... def get_user_by_id(self, id: str) -\u0026gt; User: ... The above would be managing the following the User class.\nclass User(): id: str date_created: str def __init__(self) -\u0026gt; None: self.id = str(uuid.uuid4()) self.date_created = datetime.now().strftime(\u0026#34;%y-%m-%d\u0026#34;) One possible implementation for the above is one where we have class that have all the above functions and manages the state by storing it in \u0026ldquo;memory\u0026rdquo; of the python script (in the case of a web-server, the state will be maintained for as long as the application remains running).\nclass MemoryUserStore(): memory_store: dict[str, User] def __init__(self) -\u0026gt; None: self.memory_store = {} def create_user(self, u: User) -\u0026gt; None: self.memory_store[u.id] = u def update_user(self, u: User) -\u0026gt; None: self.memory_store[u.id] = u def delete_user(self, id: str) -\u0026gt; None: self.memory_store.pop(id) def get_user_by_id(self, id: str) -\u0026gt; User: return self.memory_store[id] Another possible implementation for this would be store the content of the data to be store in some form of Json file?\nclass JSONUserStore(): internal: dict[str, str] file_name: str def _populate_internal(self) -\u0026gt; None: f = open(self.file_name, \u0026#39;r\u0026#39;) raw = json.load(f) for i in raw: self.internal[i] = raw[i] f.close() def _persist(self) -\u0026gt; None: f = open(self.file_name, \u0026#39;w\u0026#39;) json.dump(self.internal, f) f.close() def __init__(self, file_name: str) -\u0026gt; None: self.internal = {} self.file_name = file_name def create_user(self, u: User) -\u0026gt; None: self._populate_internal() self.internal[u.id] = json.dumps(u.__dict__) self._persist() self.internal = {} def update_user(self, u: User) -\u0026gt; None: self._populate_internal() self.internal[u.id] = json.dumps(u.__dict__) self._persist() self.internal = {} def delete_user(self, id: str) -\u0026gt; None: self._populate_internal() self.internal.pop(id) self._persist() self.internal = {} def get_user_by_id(self, id: str) -\u0026gt; User: self._populate_internal() item = self.internal[id] self._persist() self.internal = {} processed_item = json.loads(item) fake_user = User() fake_user.id = processed_item[\u0026#34;id\u0026#34;] fake_user.date_created = processed_item[\u0026#34;date_created\u0026#34;] return fake_user We can have the following driver code to test out the above implementations:\ndef zzz(us: UserStore) -\u0026gt; None: new_user_1 = User() print(new_user_1.id) new_user_2 = User() print(new_user_2.id) us.create_user(new_user_1) us.create_user(new_user_2) gotten_new_user = us.get_user_by_id(new_user_1.id) print(\u0026#34;new_user_1 {}\u0026#34;.format(new_user_1.id)) print(\u0026#34;gotten_new_user {}\u0026#34;.format(gotten_new_user.id)) assert gotten_new_user.id == new_user_1.id, \u0026#34;id is not the same\u0026#34; return mus = MemoryUserStore() jus = JSONUserStore(\u0026#34;zz.json\u0026#34;) zzz(jus) Do note that for the bottom section, it\u0026rsquo;s extremely trivial to switch over the implementations - let\u0026rsquo;s say for that we want to rely on memory store when we\u0026rsquo;re on server due to overabundance of memory but rely on json user store where it stores state in files in smaller systems such as workstations:\nmus = MemoryUserStore() jus = JSONUserStore(\u0026#34;zz.json\u0026#34;) zzz(mus) However, as much as these static typing mechanisms/tooling is in python now, it\u0026rsquo;s still a pain to setup and is something that I feel require a larger codebase to test it on to see how it is affected by such tooling. Technically, with static typing tools in place, it should make the developer experience on the codebase way better and simpler. However, at the moment, I haven\u0026rsquo;t gotten the time to try it out - so maybe that could be done in a future blog post.\nReferences: # https://andrewbrookins.com/technology/building-implicit-interfaces-in-python-with-protocol-classes/ https://peps.python.org/pep-0544/ ","date":"21 June 2023","externalUrl":null,"permalink":"/replicating-golang-interfaces-with-static-python-run-with-mypy/","section":"Posts","summary":"After coding in both Python and Golang, I now have a very strong preference for strongly typed languages. There is a certain charm and beauty in being able to have the IDE that I’m working in able to provide good autocomplete suggestions for the code - there is less for a need to keep moving files in the codebases just to ensure that the function spelling and params are correct etc. For smaller programs, dynamic types languages are still ok but they get very unwieldy once they go pass the hundreds of lines of code mark.\n","title":"Replicating golang interfaces with static python, run with mypy","type":"posts"},{"content":"Python is a dynamically typed language - which provides a huge developer experience as compared to a statically typed language such as Golang. Python does serve as a nice introductory programming language for new developers but as time goes by, it\u0026rsquo;s pretty easy to see why static programming language is why nicer to work with as compared to dynamically typed language. Due to the nature of such languages, it is easy to be \u0026ldquo;loosey\u0026rdquo; about the types of the variables which inadvertably makes the code harder to follow as codebases grow larger and larger. With such large codebases - even type hints on IDE becomes harder to establish (either takes too long or the tooling just deems it impossible to do so)\nNOTE: This article is just a single developer\u0026rsquo;s opinion. Feel free to disagree with it since each person\u0026rsquo;s experience with programming languages are wildly different.\nNicely enough, we can control the chaos by slowly introducing some sort of typing into such Python programs. That\u0026rsquo;ll make it easier to understand what type of variable to pass into functions. This is done by using a tool called mypy. Refer to the following website: https://mypy.readthedocs.io/. We can install it globally or in each virtual environment set up by each Python project.\nTyping introduced in Python is done via type annotations. Here is one example of how we can set a function accept a string variable. The function accepts 1 singular parameter with string type. It will return no output, which is defined by the None type annotation (defined with arrows to none. -\u0026gt; None)\ndef function1(name: str) -\u0026gt; None: print(\u0026#34;printing {}\u0026#34;.format(name)) The following function is called as follows:\nfunction1(\u0026#34;aac\u0026#34;) Do take note that even if we have all these type annotations in place, we can still run such python code even if it\u0026rsquo;s not conforming to the types - e.g.\nfunction1(123) We can benefit from all these static typing by installing the mypy plugin if we use the Visual Studio Code. Reference: https://marketplace.visualstudio.com/items?itemName=matangover.mypy\nWe can setup the configuration file (mypy.ini) as follows:\n[mypy] disallow_any_unimported = True disallow_untyped_calls = True disallow_untyped_defs = True warn_return_any = True warn_unreachable = True Once we have it all setup properly, there will be the red squiggly line when the types are no inline based on the types defined by the function.\nLet\u0026rsquo;s go with further examples. Let\u0026rsquo;s say we have a function that we want to have a function that accepts an integer but it returns an integer instead of returning nothing.\ndef function2(lol: int) -\u0026gt; int: return lol + 5 function2(12) Let\u0026rsquo;s change things up and instead, we have functions that accept an object instead of basic types such as integer or string etc. function4 would simply accept Hoho object and it does not return anything from the function. function4a would accept a simple integer variable but returns a instantiated Hoho object.\nclass Hoho(): hoho: str santa: float def __init__(self, santaInit: int) -\u0026gt; None: self.hoho = \u0026#34;acaca\u0026#34; self.santa = santaInit def print_santa(self) -\u0026gt; None: print(\u0026#34;value of santa {}\u0026#34;.format(self.santa)) def function4(za: Hoho) -\u0026gt; None: za.print_santa() def function4a(pp: int) -\u0026gt; Hoho: return Hoho(pp) h = Hoho(123) function4(h) a4 = function4a(79) function4(a4) Alternatively, we can change it up such that we have a function that accepts a list of integers for our function. Here is how we define the type annotation for a list of integer that would be passed as parameter for a function.\ndef function5(zolo: list[str]) -\u0026gt; int: ya = 0 for x in zolo: ya += 1 print(\u0026#34;item: {}\u0026#34;.format(x)) return ya function5([\u0026#34;acac\u0026#34;, \u0026#34;qwec\u0026#34;, \u0026#34;kqlmc\u0026#34;]) So, we have covered basic types such as strings, integers, etc, user defined objects and list of objects. In the python language, it is possible for a function to accept another that it would be processed further.\nfrom typing import Callable def function6(qa: str, zzz: Callable[[str, str],bool]) -\u0026gt; None: p = zzz(\u0026#34;acac\u0026#34;, qa) if p: print(\u0026#34;function6 and true\u0026#34;) else: print(\u0026#34;function6 and false\u0026#34;) def function7(ya: str, zzz: str) -\u0026gt; bool: print(ya) print(zzz) if ya == \u0026#34;ya\u0026#34;: return True else: return False Let\u0026rsquo;s say we would want to use an external library such as pandas. Apparently, when we use such external libraries, apparently the types doesn\u0026rsquo;t come in built together with the actual package. However, if that\u0026rsquo;s the case, then we would not be able to use external libraries easily and ensure that types from such external libraries would flow easily into the python scripts that we write.\nThe solution as of now is to install stub libraries - in the case, for pandas, apparently, there is a pandas-stubs library. With pandas-stubs library, we can use types such as the DataFrame type - which is technically a type that would be provided by pandas and would be an object that some of the function that returns the DataFrame object.\nimport pandas as pd df = pd.read_csv(\u0026#34;lol.csv\u0026#34;) def function1(data: pd.DataFrame) -\u0026gt; int: return len(data) val = function1(df) print(val) Unfortunately, the above set of code snippets are simply small easy code snippets that can demonstrate the possibility of introducing typing for Python scripts. In order to properly test this out, we would need to introducing typing to actual large python codebase - that would give us a bit of \u0026ldquo;battle testing\u0026rdquo; to show how it can be useful for developers - since a large point of introducing typing into python programs is to make the developer experiences way smoother.\n","date":"14 June 2023","externalUrl":null,"permalink":"/writing-static-python-with-mypy/","section":"Posts","summary":"Python is a dynamically typed language - which provides a huge developer experience as compared to a statically typed language such as Golang. Python does serve as a nice introductory programming language for new developers but as time goes by, it’s pretty easy to see why static programming language is why nicer to work with as compared to dynamically typed language. Due to the nature of such languages, it is easy to be “loosey” about the types of the variables which inadvertably makes the code harder to follow as codebases grow larger and larger. With such large codebases - even type hints on IDE becomes harder to establish (either takes too long or the tooling just deems it impossible to do so)\n","title":"Writing static python with mypy","type":"posts"},{"content":"In many examples for helm charts, the general focus is on the \u0026ldquo;2nd\u0026rdquo; day operations of having applications running without too much issues. In the case for usual web developers, that would mean applications handled with kubernetes deployment objects which would run a set number of replicas (or handled via HPA) in the kubernetes cluster.\nHowever, what if the application needs to rely on a database? The one important thing when it comes to applications that rely on database is that we need to have a way to do database migrations. The database migrations could be run via sql scripts or even binaries that run certain set of sql operations to set the initial scheme for the database that we would need to setup for our application.\nLet\u0026rsquo;s say our database is in the kubernetes cluster (sometimes, a controversial choice). We can simply run the commands to do the migration from our \u0026ldquo;deployment\u0026rdquo; machine. The deployment machine could be some jenkins server that run some shell script that would then do the sql migration on the database. But this would mean that the database migration is something outside application upgrade lifecycle which eventually would mean a harder process to grasp for developers of the application to deal with application upgrades as well as updates to database schema.\nFortunately for us, Helm has a mechanism - lifecycle hook mechanisms. Reference: https://helm.sh/docs/topics/charts_hooks/. We can run certain application of kubernetes objects (e.g Kubernetes jobs) to do certain things that we need to do before our application properly start up - e.g. we can setup a lifecycle helm hook that would set up a Kubernetes job - which in our case would be a database migration job.\nHere is an example that we can refer to for explaning an example of how we can do this: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicMigrate\nThe first part would be deploy a database - a Maria database (a MySQL compatible database)\nhelm install -f db-values.yaml mariadb oci://registry-1.docker.io/bitnamicharts/mariadb That would set up a single replica database with a a set amount of resources which we can then have our application rely on.\nOur application\u0026rsquo;s helm chart is in the basicMigrate folder in the reference url above. The important bit to tap for the helm hooks is the annotations on the Kubernetes job. Do note that the database credentials set via environment variables are only for example purposes. It would be better to rely on a proper secret management system to ensure that none of such credentials could be leaked out so easily.\napiVersion: batch/v1 kind: Job metadata: name: {{ include \u0026#34;basicMigrate.fullname\u0026#34; . }}-migrate labels: {{- include \u0026#34;basicMigrate.labels\u0026#34; . | nindent 4 }} annotations: \u0026#34;helm.sh/hook\u0026#34;: pre-install,pre-upgrade \u0026#34;helm.sh/hook-weight\u0026#34;: \u0026#34;0\u0026#34; \u0026#34;helm.sh/hook-delete-policy\u0026#34;: before-hook-creation spec: backoffLimit: 5 activeDeadlineSeconds: 300 template: labels: {{- include \u0026#34;basicMigrate.labels\u0026#34; . | nindent 6 }} spec: serviceAccountName: {{ include \u0026#34;basicMigrate.serviceAccountName\u0026#34; . }} securityContext: {{- toYaml .Values.podSecurityContext | nindent 8 }} restartPolicy: Never containers: - name: {{ .Chart.Name }} securityContext: {{- toYaml .Values.securityContext | nindent 12 }} image: \u0026#34;{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}\u0026#34; imagePullPolicy: {{ .Values.image.pullPolicy }} command: - \u0026#34;app\u0026#34; - \u0026#34;migrate\u0026#34; resources: {{- toYaml .Values.resources | nindent 12 }} env: - name: DATABASE_USER value: \u0026#34;username\u0026#34; - name: DATABASE_PASSWORD value: \u0026#34;password\u0026#34; - name: DATABASE_HOST value: \u0026#34;mariadb.default.svc\u0026#34; - name: DATABASE_NAME value: \u0026#34;application\u0026#34; For the above database migration jobs, we will set up the database migraiton job before the install or upgrade of our application. This is based off \u0026quot;helm.sh/hook\u0026quot;: pre-install,pre-upgrade.\nWe can install the helm chart via the following command:\nhelm upgrade --install -f app-values.yaml basic ./basicMigrate ","date":"7 June 2023","externalUrl":null,"permalink":"/running-database-migrations-in-helm-chart/","section":"Posts","summary":"In many examples for helm charts, the general focus is on the “2nd” day operations of having applications running without too much issues. In the case for usual web developers, that would mean applications handled with kubernetes deployment objects which would run a set number of replicas (or handled via HPA) in the kubernetes cluster.\n","title":"Running database migrations in Helm chart","type":"posts"},{"content":"When building applications in docker images, there is sometimes a need to consider the size of the containers. There are multiple reasons for us to monitor and check this:\nIn the case where our container registry is actually by us rather than the on public registries. The size of the container would affect the cost of storing all those artifacts. Let\u0026rsquo;s say we are to look at some of the private container registries that we can setup on public clouds such as Google Cloud - there is a pricing set on per GB of storage as well as networking costs for shifting the container images out of the container registry. A smaller image is simply faster to move around. Let\u0026rsquo;s say if we have a Kubernetes cluster that would need to run the container and let\u0026rsquo;s also say that we need the container be run on multiple nodes of the cluster. Evidently, a container with a smaller footprint will take a way shorter time to pull the images from the registry. A larger container that could easily be in the Gigabyte range - e.g. images that container language runtimes etc. would take a way longer time to download as well as startup. One can kind of argue that the less stuff inside the container, the smaller the container would contain an application that has a security loophole. With that, it is beneficial for us to build \u0026ldquo;smaller\u0026rdquo; container images - the benefits would be more evident more so for the infrastructure teams rather than the application teams. To application teams, we would probably have to suffer quite a bit since smaller container images would mean \u0026ldquo;useful\u0026rdquo; stuff would be removed from the container.\nLet\u0026rsquo;s demonstrate the various ways of building a container built with Golang.\nRefer to the following github url: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basic\nNaive Dockerfile # We can naively building the docker image by simplying using a base image that containers that Golang runtime - naturally, it would pretty huge (but we have a image that we can run on another machine)\nFROM golang:1.18 WORKDIR /helloworld ADD . . RUN go build -o app . CMD [\u0026#34;/helloworld/app\u0026#34;] EXPOSE 8080 This simply builds a container but it includes the entire Golang runtime (which is usually unnecessary in a production environment). We can build the container but running the following command:\ndocker build -t naive-app -f Dockerfile . Using Slim Dockerfile # The first level of reduction that can be done would to simply use a debian or ubuntu container image. However, we can simply just to the \u0026ldquo;slimmed\u0026rdquo; down version of such images by using the \u0026ldquo;slim\u0026rdquo; editions of it - this can be done by simply using the slim tag - refer to the Dockerfile definition below.\nFROM golang:1.18 as builder WORKDIR /helloworld ADD . . RUN go build -o app . FROM debian:bookworm-slim RUN apt update \u0026amp;\u0026amp; \\ apt install -y ca-certificates \u0026amp;\u0026amp; \\ apt clean \u0026amp;\u0026amp; \\ rm -rf /var/lib/apt/lists/* WORKDIR /helloworld COPY --from=builder /helloworld/app /helloworld/app CMD [\u0026#34;/helloworld/app\u0026#34;] EXPOSE 8080 The important thing to note here is the need to update the ca-certificates apt package. If we skipped the installation of ca-certificates package, we will face the following issue:\n2023/08/07 16:41:43 Hello world sample started. 2023/08/07 16:41:49 Start DoHTTPReq 2023/08/07 16:41:49 Attempting to query the following url https://www.github.com 2023/08/07 16:41:49 unable to get data from url Get \u0026#34;https://www.github.com\u0026#34;: x509: certificate signed by unknown authority 2023/08/07 16:41:49 End DoHTTPReq The application cannot query https endpoints since they don\u0026rsquo;t have the updated global ca-certifactes - it cannot establish the chain of trust of modern of websites. Once we update ca-certificates - this becomes a non-issue.\nWe can build the docker image by running the following command:\ndocker built -t slim-app -f slim.Dockerfile . Using alpine Dockerfile # The next level of cutting the size of container images down would be using the alpine set of images. Alpine images are generally well known as a set of images that is usually smaller than the ones containers of the official linux distributions such as debian or ubuntu or centos?\nRefer to the following Dockerfile definition:\nFROM golang:1.18 as builder WORKDIR /helloworld ADD . . RUN go build -o app . FROM alpine:3.16 RUN apk add gcompat WORKDIR /helloworld COPY --from=builder /helloworld/app /helloworld/app CMD [\u0026#34;/helloworld/app\u0026#34;] EXPOSE 8080 The important step would be the following line:\nRUN apk add gcompat It is important for us to understand the usual \u0026ldquo;c\u0026rdquo; tooling that Golang relies is not available here. By default, Golang actually builds and relies on glibc - it kind of depends on it for networking etc. Alpine doesn\u0026rsquo;t have the glibc stuff - it has musl instead - which is different enough to the point that compiled binaries that rely on glibc will not run on musl. If we didn\u0026rsquo;t add that line, we will face the following issue:\nexec /helloworld/app: no such file or directory A further explanation on this can be found on this link:\nhttps://stackoverflow.com/questions/66963068/docker-alpine-executable-binary-not-found-even-if-in-path\nAlternatively, we can simply build the binary without CGO by setting it to disabled and then embed it into the alpine base container image. It should work with little to no issue here.\nFROM golang:1.18 as builder WORKDIR /helloworld ADD . . RUN CGO_ENABLED=0 go build -o app . FROM alpine:3.16 WORKDIR /helloworld COPY --from=builder /helloworld/app /helloworld/app CMD [\u0026#34;/helloworld/app\u0026#34;] EXPOSE 8080 To build the container, we can do so by running the following command:\ndocker build -t alpine-app -f alpine.Dockerfile . Additional Info on Golang and Glibc # On a side note, as another proof that default Golang compilation kind of depends on Glibc is to build the following Dockerfile with the application mentioned in the github link:\nFROM golang:1.20 as builder WORKDIR /helloworld ADD . . RUN go build -o app . FROM debian:jessie-20170606 WORKDIR /helloworld COPY --from=builder /helloworld/app /helloworld/app CMD [\u0026#34;/helloworld/app\u0026#34;] EXPOSE 8080 When we attempt to start the application from the built image, we would face the following issue:\n/helloworld/app: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32\u0026#39; not found (required by /helloworld/app) /helloworld/app: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34\u0026#39; not found (required by /helloworld/app) If we built the Golang library without cgo\nFROM golang:1.20 as builder WORKDIR /helloworld ADD . . RUN CGO_ENABLED=0 go build -o app . FROM debian:jessie-20170606 WORKDIR /helloworld COPY --from=builder /helloworld/app /helloworld/app CMD [\u0026#34;/helloworld/app\u0026#34;] EXPOSE 8080 It would work as normal\nUsing distroless Dockerfile # The final frontier for cutting out the unnecessary stuff from the container and simply focus on the application just having the small necessary stuff to run it would be to rely on Distroless base images. Distroless base images are the most minimalistic container images that almost has nothing within it - and that includes not having a shell - which makes debugging really really difficult. But then again, do you need really need debugging tooling in production settings? - shouldn\u0026rsquo;t we use obsevability tools to ensure that we know what\u0026rsquo;s happening with our application?\nHere is a Dockerfile that relies on Distroless base images\nFROM golang:1.18 as builder WORKDIR /helloworld ADD . . RUN CGO_ENABLED=0 go build -o app . FROM gcr.io/distroless/static-debian11:nonroot WORKDIR /helloworld COPY --from=builder /helloworld/app /helloworld/app CMD [\u0026#34;/helloworld/app\u0026#34;] EXPOSE 8080 Refer to the following github repo for more information with regards to distroless images: https://github.com/GoogleContainerTools/distroless\nWe can simply follow the instructions and examples from the distroless github link - one can assume that it is has a somewhat similar structure as Alpine images except it is a \u0026ldquo;stripped\u0026rdquo; down version of it.\nWe can build the container by running the following command:\ndocker build -t distroless-app -f distroless.Dockerfile . Quick comparisons # Once we build all the containers from the above Dockerfiles, we can finally compare the sizes and see the benefits that we gain by using the right base images so that we can create smaller images.\nnaive-app latest eeffd65a394a 54 seconds ago 972MB distroless-app latest c5d38cc0c66e 2 hours ago 9.34MB alpine-app latest 988b265abc84 2 hours ago 15.1MB slim-app latest e785101ac2c8 2 hours ago 91.5MB From the above, distroless images are the smallest, followed by alpine and then slim based images. Slim images are way better that the first naive approach of building the container and using the base image that contains the Golang runtime.\nAlthough having smaller footprint for container size, we should also realize the dificulty when dealing with the alpine or distroless base images. We are dealing with newer set of tooling - apt etc is not exactly available out of the box. There is a need to relearn how do things such as debugging or running networking tooling etc.\n","date":"31 May 2023","externalUrl":null,"permalink":"/using-smaller-base-images-for-applications-slim-images-alpine-images-distroless-images/","section":"Posts","summary":"When building applications in docker images, there is sometimes a need to consider the size of the containers. There are multiple reasons for us to monitor and check this:\nIn the case where our container registry is actually by us rather than the on public registries. The size of the container would affect the cost of storing all those artifacts. Let’s say we are to look at some of the private container registries that we can setup on public clouds such as Google Cloud - there is a pricing set on per GB of storage as well as networking costs for shifting the container images out of the container registry. A smaller image is simply faster to move around. Let’s say if we have a Kubernetes cluster that would need to run the container and let’s also say that we need the container be run on multiple nodes of the cluster. Evidently, a container with a smaller footprint will take a way shorter time to pull the images from the registry. A larger container that could easily be in the Gigabyte range - e.g. images that container language runtimes etc. would take a way longer time to download as well as startup. One can kind of argue that the less stuff inside the container, the smaller the container would contain an application that has a security loophole. With that, it is beneficial for us to build “smaller” container images - the benefits would be more evident more so for the infrastructure teams rather than the application teams. To application teams, we would probably have to suffer quite a bit since smaller container images would mean “useful” stuff would be removed from the container.\n","title":"Using smaller base images for applications, slim images? alpine images? distroless images","type":"posts"},{"content":"When one thinks of Kubernetes and deploying stuff into Kubernetes, one of the usual ways to get such stuff into Kubernetes is through the use of Kubernetes manifest files. Kubernetes manifest files describe various different resources in Kubernetes cluster - some primary examples that are often used are Deployment, Configmap, Secret, Service and even Ingress Kubernetes resources/objects.\nHowever, managing a whole bunch of Kubernetes resources is usually quite troublesome - there is a usual need for templating when trying to get such resources into the clusters. Helm is a tool that came up in order to solve this. With helm, we can package a bunch of kubernetes resources in a single \u0026ldquo;package\u0026rdquo; (it\u0026rsquo;s simply a tar file) and deploy the whole lot into cluster, we won\u0026rsquo;t miss a resource by accident etc.\nHowever, there are cases where sometimes, the Kubernetes manifest files generated from helm doesn\u0026rsquo;t fully fit their requirements - there could be a possibility that the Helm chart isn\u0026rsquo;t flexible enough to accept some of the stuff they need (e.g. setting of additional annotations/labels - the author of the Helm chart need to ensure that the field accepts the variable from the values.yaml that is to be passed in via Helm cli tool).\nAlso, let\u0026rsquo;s pose another scenario where a maintainer of a DC needs to be deploy 50 helm charts on the cluster. Let\u0026rsquo;s say the cluster is \u0026ldquo;limited\u0026rdquo; in resources and we would need to define lower initial replicas to be run on the cluster at the beginning. It would be pain to have the maintainer of DC to go in to modify values.yaml that is to be fed to each of the helm chart - we can\u0026rsquo;t assume that the replicas field in the values.yaml is the same across the helm charts. If one is to take a look at some of the open source code helm charts - all of them are set differently\u0026hellip;\ne.g. https://artifacthub.io/packages/helm/bitnami/minio\nstatefulsets: replicaCount: 4 e.g. https://artifacthub.io/packages/helm/grafana/grafana\nreplicas: 1 e.g. https://artifacthub.io/packages/helm/bitnami/wordpress\nreplicaCount: 1 Some of the helm chart even have \u0026ldquo;sub\u0026rdquo; components (they declare multiple deployments) and those also have replicas that needs to be managed.\nWith all that in mind, we can\u0026rsquo;t use and standardize the values.yaml. However, we do know that the generated yaml are valid Kubernetes manifest files and those would be standardized. We can technically do some yaml manipulation and then have helm manage the installation.\nHelm has a flag called --post-renderer where we can have some executable that we can pass to manipulate the generated Kubernetes manifest files.\nRefer to the following example application and Helm chart: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicHelm\nWe can use the a usual yaml manipulation tool - yq to manipulate the generated yaml. If we have a shell script such as the following:\n#!/bin/bash yq eval \u0026#39;.metadata.annotations.cool = \u0026#34;miao\u0026#34;\u0026#39; - Don\u0026rsquo;t forget that we would need to set the file permissins for yahoo.sh to be executable. We can then run the following command:\nhelm template zolo ./basic-app --post-renderer ./yahoo.sh It should generate the following (this is just a small snippet - the full output is pretty long). It is just an example based of one of the resources:\n... # Source: basic-app/templates/deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: zolo-basic-app labels: helm.sh/chart: basic-app-0.1.0 app.kubernetes.io/name: basic-app app.kubernetes.io/instance: zolo app.kubernetes.io/version: \u0026#34;1.16.0\u0026#34; app.kubernetes.io/managed-by: Helm annotations: cool: miao spec: replicas: 1 selector: ... Take note of the annotations added according to our shell script.\nAlthough it is possible to use yq tooling to manipulate the output Kubernetes manifest, it is usually not specific enough. yq tool is a generic tool that is ok to manipuate generic yaml files. However, if we need more specificity, then it might be better to use the kustomie tool that is able to manipulate kubernetes manifest files. It has more specificity and provides way more flexibility (it even integrated jsonpatch mechanism)\nLet\u0026rsquo;s say if we wanted to set replicas for all deployment objects in generated kubernetes manifest files. We would first need to define a shell script to spit out the generated kubernetes manifest file to a physical file. We can then use the kustomize on generated file, afterwhich we can then view the post-rendered yaml files.\nkustomize.sh\n#!/bin/bash cat \u0026lt;\u0026amp;0 \u0026gt; all.yaml kustomize build . \u0026amp;\u0026amp; rm all.yaml Here is the kustomize.yaml that we can apply on the generated all.yaml\n# Refer to the following documentation page: # https://kubectl.docs.kubernetes.io/references/kustomize/kustomization/ # # Comment out resources accordingly to which is to be applied resources: - all.yaml # - alteredclient.yaml apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization patches: - patch: |- - op: replace path: /spec/replicas value: 5 target: group: apps version: v1 kind: Deployment We can run the above by running the following:\nhelm template zolo ./basic-app --post-renderer ./kustomize.sh With that, we would alter all deployment objects generated from the generated chart to have 5 replicas.\nOne question I immediately pondered while checking out this functionality is \u0026ldquo;why not simply send the post-rendering via piping it through scripts?\u0026rdquo;. An example:\nhelm template zolo ./basic-app | yq eval \u0026#39;.metadata.annotations.cool = \u0026#34;miao\u0026#34;\u0026#39; - Technically, this is possible - however, we\u0026rsquo;ve been using the template subcommand till now. One of the usual subcommands some people use is to utilzie the upgrade or install subcommands provided via the helm cli tool. An example would be something like this:\nhelm upgrade --install zolo ./basic-app We would use this command so that we can make use some of helm\u0026rsquo;s lifecycle application installation tooling, namely the pre-install and post-install hooks. Kubernetes manifest files in generated don\u0026rsquo;t have any order when we apply it to the cluster but we can set the ordering within helm chart. In order to make use of all of helm\u0026rsquo;s application lifecycle features, we can simply add on the --post-renderer flag - and that would allow us to simply continue with installations with modifications on the generated Kubernetes manifest file.\nhelm upgrade --install --post-renderer ./kustomize.sh zolo ./basic-app ","date":"24 May 2023","externalUrl":null,"permalink":"/altering-outputs-of-helm-installations-with-post-renderer-via-kustomize/","section":"Posts","summary":"When one thinks of Kubernetes and deploying stuff into Kubernetes, one of the usual ways to get such stuff into Kubernetes is through the use of Kubernetes manifest files. Kubernetes manifest files describe various different resources in Kubernetes cluster - some primary examples that are often used are Deployment, Configmap, Secret, Service and even Ingress Kubernetes resources/objects.\n","title":"Altering outputs of helm installations with post-renderer via kustomize","type":"posts"},{"content":"When building an application, a common way to alter and set the running properties of the application is to use configuration files that could be written with JSON or Yaml files. This is the same even if the application is simply deployed in a Virtual Machine or even in a container within a Kubernetes Cluster. The general assumption is that the configuration file does not change that often - if the configuration files is to be change, the usual way to have the application conform to the new configuration file would be stop the currently running the application and start it once more.\nThe restart of applications if it happen to be deployed in a Virtual Machine is relatively easier to handle. If the application is managed via systemd, we can simply using the systemctl command line tool to simply restart the application once we altered the configuration files on the virutal machine. However, this would be slightly different if we happen to be on Kubernetes cluster. The way it is done would be slightly different.\nLet\u0026rsquo;s assume that we deploy our applications into the Kubernetes cluster using plain old yaml manifest files. If this is to be done, we would need to first alter the manifest file to alter the configmaps which would apply the application configuration. Our application will be defined in Kubernetes deployment. In order to \u0026ldquo;restart\u0026rdquo; our pod created via Kubernetes deployment is by deleting the pod and the new pod will be recreated automatically - the configuration will be reloaded when the application starts in the new pod. However, do note of the manual action that we need to take here which is to delete pod after the new configmap has been applied to the cluster.\nHowever, let\u0026rsquo;s say if we were to manage our application via helm charts instead. Ideally, an upgrade of the helm chart on the cluster should be sufficient to ensure that the application is using the new configuration from the update configmaps. It wouldn\u0026rsquo;t make sense for us to \u0026ldquo;upgrade\u0026rdquo; the application being managed by the helm chart and then delete pod just to ensure that the pod would pick up the new configuration. If we simply just upgrade the helm chart just as it is, the configmaps will be updated but the pods and deployment objects will not be updated (if there isn\u0026rsquo;t any changes for the hydrated for the manifests). The pods would continue running with the old configuration.\nHere, we can make use a useful property that comes with deployment object - if annotations for pod is to change - it would result in the defined pod being \u0026ldquo;different\u0026rdquo; and hence, the deployment would need to restart the rollout. We can simply apply it here and observe how it works:\nLet\u0026rsquo;s say we have a helm chart which accepts a configuration yaml which can be passed into helm chart installation process. The image to be deployed would be yahoo:v30 which is also defined in the configuration yaml file. Let\u0026rsquo;s say the configuration file is saved as aa.yaml here.\nThe following helm chart we are going to be referencing would be found in the following github repo:\nhttps://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicHelm\nimage: repository: gcr.io/xxx/yahoo tag: v30 appConfig: | lol: caca miao: zzz Within the deployment.yaml, we would need to define the annotations of our pod defined within the deployment as follows:\n... spec: {{- if not .Values.autoscaling.enabled }} replicas: {{ .Values.replicaCount }} {{- end }} selector: matchLabels: {{- include \u0026#34;basic-app.selectorLabels\u0026#34; . | nindent 6 }} template: metadata: annotations: configmap-hash: {{ .Values.appConfig | sha256sum }} {{- with .Values.podAnnotations }} {{- toYaml . | nindent 8 }} {{- end }} labels: {{- include \u0026#34;basic-app.selectorLabels\u0026#34; . | nindent 8 }} spec: {{- with .Values.imagePullSecrets }} imagePullSecrets: {{- toYaml . | nindent 8 }} {{- end }} ... To test this out, we can simply run the following set of commands to install it:\nhelm upgrade -f aa.yaml --install basic-app ./basic-app This would get a pod running:\n% kubectl get pods NAME READY STATUS RESTARTS AGE basic-app-584f4bd9df-htjts 1/1 Running 0 5m If we were to update the appConfig field in our configuration file aa.yaml - we should see the following:\nUpdated aa.yaml\u0026hellip;\nimage: repository: gcr.io/xxx/yahoo tag: v30 appConfig: | lol: caca miao: zzz anotherConfig: 12 And we would need to rerun the upgrade to bump up the helm chart:\nhelm upgrade -f aa.yaml --install basic-app ./basic-app We should observe the new pod being created.\n% kubectl get pods NAME READY STATUS RESTARTS AGE basic-app-584f4bd9df-htjts 1/1 Running 0 7m basic-app-6847c48b69-cwb7b 0/1 Running 0 3s With that, we can ensure that the new configuration would be applied to our application in the case where our application only reads the configuration on initial start up of the application.\n","date":"17 May 2023","externalUrl":null,"permalink":"/updating-configuration-in-kubernetes-pods-managed-via-helm/","section":"Posts","summary":"When building an application, a common way to alter and set the running properties of the application is to use configuration files that could be written with JSON or Yaml files. This is the same even if the application is simply deployed in a Virtual Machine or even in a container within a Kubernetes Cluster. The general assumption is that the configuration file does not change that often - if the configuration files is to be change, the usual way to have the application conform to the new configuration file would be stop the currently running the application and start it once more.\n","title":"Updating configuration in Kubernetes pods managed via Helm","type":"posts"},{"content":"In the real world, we often have to deal with such large traffic loads that it is almost necessary to know that there is possibility that we might need to get data stored in a cluster of machines. In the case if we have applications that barely need to deal and manage data, we can simply on existing products out there that can simply scale out the number of replicas of the application which it can simply serve pretty easily. However, what about applications that rely on database? We need our database server cluster to also scale out accordingly as well (there are limits to scale vertically in most cloud providers after all)\nInitial naive approach to distribute data between server nodes # There are numerous approach to do this. One way would be to store data by leveraging some sort of metatable database to store references or metadata about the data (this will store the \u0026ldquo;primary key\u0026rdquo;, while the actual data is stored in separate data nodes. This approach allows for efficient management and retrieval of data across the distributed system. The metatable database serves as a central repository that maintains information such as the location and characteristics of the data, while the data nodes store the actual content. This approach is how Hadoop does things - if you were to deploy a Hadoop cluster, you would know that you would need to deploy name servers (which serve to manage the metadata of where each data point is stored) as well as data nodes. However, this design come with massive flaw where if name server ever goes down - the hadoop cluster is essentially rendered \u0026ldquo;useless\u0026rdquo; since now, none of the clients would know where each of the data point would be stored.\nAnother approach when it comes to distributing data across multiple servers is to use a simple hashing algorithm. In this approach, the data to be stored is hashed using a hashing function, and the resulting hash value is used to determine the server to which the data should be assigned. The idea behind this approach is that by evenly distributing the data based on its hash value, the workload can be balanced across the servers. There is no dependency on some central server to determine where each data point is stored which is already a big plus. The initial naive approach would be to just take the hash and run modulus/division operations to get the server to store the piece of data on.\npackage main import ( \u0026#34;crypto/md5\u0026#34; \u0026#34;encoding/hex\u0026#34; \u0026#34;fmt\u0026#34; \u0026#34;math/big\u0026#34; \u0026#34;strconv\u0026#34; ) var ( data []string = []string{} nodeAssignment []int = []int{} initialNodeCount = 3 dataCount = 1000 ) func hasher(v string) int64 { bi := big.NewInt(0) h := md5.New() h.Write([]byte(v)) hexstr := hex.EncodeToString(h.Sum(nil)) bi.SetString(hexstr, 16) value := bi.Int64() if value \u0026lt; 0 { value = value * -1 } return value } func main() { for i := 0; i \u0026lt; dataCount; i++ { data = append(data, \u0026#34;weatherinsingaporehot\u0026#34;+strconv.Itoa(i)) } for _, v := range data { value := hasher(v) nodeAssign := value % int64(initialNodeCount) nodeAssignment = append(nodeAssignment, int(nodeAssign)) } fmt.Println(nodeAssignment) } For the above code snippet, we generate 1000 datapoints and it gets allocated across 3 servers of sorts. It is possible to change the number of nodes to make it seem like we have a bigger cluster size that we would need to balance our data across. An important thing that we would want to check is to see how balanced our data across our servers. The function to do so is a simple one where we add counts to some sort of hashmap.\nfunc dataBalancingCounter(assignments []int) map[string]int { hoho := map[string]int{} for _, v := range assignments { hoho[\u0026#34;node\u0026#34;+strconv.Itoa(v)] = hoho[\u0026#34;node\u0026#34;+strconv.Itoa(v)] + 1 } return hoho } The full golang code with this function would like this:\npackage main import ( \u0026#34;crypto/md5\u0026#34; \u0026#34;encoding/hex\u0026#34; \u0026#34;fmt\u0026#34; \u0026#34;math/big\u0026#34; \u0026#34;strconv\u0026#34; ) var ( data []string = []string{} nodeAssignment []int = []int{} initialNodeCount = 3 dataCount = 1000 ) func dataBalancingCounter(assignments []int) map[string]int { hoho := map[string]int{} for _, v := range assignments { hoho[\u0026#34;node\u0026#34;+strconv.Itoa(v)] = hoho[\u0026#34;node\u0026#34;+strconv.Itoa(v)] + 1 } return hoho } func hasher(v string) int64 { bi := big.NewInt(0) h := md5.New() h.Write([]byte(v)) hexstr := hex.EncodeToString(h.Sum(nil)) bi.SetString(hexstr, 16) value := bi.Int64() if value \u0026lt; 0 { value = value * -1 } return value } func main() { for i := 0; i \u0026lt; dataCount; i++ { data = append(data, \u0026#34;weatherinsingaporehot\u0026#34;+strconv.Itoa(i)) } for _, v := range data { value := hasher(v) nodeAssign := value % int64(initialNodeCount) nodeAssignment = append(nodeAssignment, int(nodeAssign)) } fmt.Printf(\u0026#34;split of data:\\n%v\\n\u0026#34;, dataBalancingCounter(nodeAssignment)) } Output of the following code is this:\nsplit of data: map[node0:323 node1:341 node2:336] The split of data across the nodes is actually not too bad considering that we\u0026rsquo;re not exactly storing any metadata to say where each data point is across the entire cluster. Upon receiving any traffic, each node would be able to point the request to another node accordingly that would serve the data point required for the request.\nI wish we can simply end things here but the next section is probably the main reason why I\u0026rsquo;m even writing this post in the first place.\nAdding a new node (now you need to rebalance!) # Let\u0026rsquo;s say we are in an \u0026ldquo;emergency\u0026rdquo; and we realize that our data storage nodes are maybe at 60-80% capacity and is somewhat close to dying due to workload put on it. The normal assumption here is that it should be possible to increase the number of data storage nodes to provide some sort of relief to the rest of the nodes and to allow for performance improvements across the entire cluster. The new node should be able to take up the load and should be able to start serving the required data to incoming traffic. However, in order for it to do this, the new storage node should hold on to some of said data from other nodes. How else would it be able to serve the traffic if it doesn\u0026rsquo;t hold the data?\nThe whole process of having data being transfered between nodes during addition or removal or replacement of nodes is called data rebalancing. Data rebalancing across servers in a distributed system is a complex and challenging task, but it is crucial for maintaining system performance, load distribution, and fault tolerance. As the system evolves and scales, the data distribution among servers may become imbalanced due to various factors such as server failures, additions, or changes in data access patterns. This imbalance can lead to overloaded servers, increased latency, and inefficient resource utilization. Data rebalancing aims to address these issues by redistributing the data across servers in a more equitable and efficient manner. However, achieving seamless data rebalancing is difficult due to the need to minimize disruption to ongoing operations, ensure data consistency, and optimize network and storage resources.\nAlso, another point to take note is that now, our data no longer maps correctly to the right server based on our hashing function. It wouldn\u0026rsquo;t make sense for us to have our data systems remember previous mappings for database. Let\u0026rsquo;s take our previous naive approach and use it to reassign it to a large cluster of nodes. One additional calculation that we will need to do is to check the percentage that needs to be moved around.\npackage main import ( \u0026#34;crypto/md5\u0026#34; \u0026#34;encoding/hex\u0026#34; \u0026#34;fmt\u0026#34; \u0026#34;math/big\u0026#34; \u0026#34;strconv\u0026#34; ) var ( data []string = []string{} nodeAssignment []int = []int{} nodeReassignment []int = []int{} initialNodeCount = 3 finalNodeCount = 4 dataCount = 1000 ) func dataBalancingCounter(assignments []int) map[string]int { hoho := map[string]int{} for _, v := range assignments { hoho[\u0026#34;node\u0026#34;+strconv.Itoa(v)] = hoho[\u0026#34;node\u0026#34;+strconv.Itoa(v)] + 1 } return hoho } func hasher(v string) int64 { bi := big.NewInt(0) h := md5.New() h.Write([]byte(v)) hexstr := hex.EncodeToString(h.Sum(nil)) bi.SetString(hexstr, 16) value := bi.Int64() if value \u0026lt; 0 { value = value * -1 } return value } func main() { for i := 0; i \u0026lt; dataCount; i++ { data = append(data, \u0026#34;weatherinsingaporehot\u0026#34;+strconv.Itoa(i)) } for _, v := range data { value := hasher(v) nodeAssign := value % int64(initialNodeCount) nodeAssignment = append(nodeAssignment, int(nodeAssign)) nodeReassign2 := value % int64(finalNodeCount) nodeReassignment = append(nodeReassignment, int(nodeReassign2)) } changeRequired := 0 for i, _ := range nodeAssignment { if nodeAssignment[i] != nodeReassignment[i] { changeRequired = changeRequired + 1 } } fmt.Printf(\u0026#34;%v of the data is changed\\n\u0026#34;, float64(changeRequired)/float64(dataCount)*100) fmt.Printf(\u0026#34;split of data:\\n%v\\n\u0026#34;, dataBalancingCounter(nodeAssignment)) } The output for the above code:\n76.8 of the data is changed split of data: map[node0:240 node1:251 node2:244 node3:265] Observe the pretty large percentage of data that needs to be moved around in order to rebalance our data to our new cluster size - its 76.8% of the data. With that amount of data, that would also mean that it will take a while for data to be rebalanced across the server nodes.\nThis is definitely an area to be optimized - it would nice to have something that can help optimize this further. (Of course there is, that\u0026rsquo;s kind of the whole point of writing this blog post\u0026hellip;)\nConsistent Hashing # Consistent hashing is one of the algorithms that has been thought of to try to tackle the following issue at its head. The algorithm was brought forward by a researcher to mainly solve load balancing issues but I suppose the industry saw that it can also be used similarly in just plenty of distributed systems in general.\nThe following post is somewhat an attempt to understand how consistent hashing can help with rebalancing of data between distributed data systems. Refer to the following video:\nHere are some other reference links that prove useful in trying to understand this need for consistent hashing.\nhttp://highscalability.com/blog/2023/2/22/consistent-hashing-algorithm.html The following blog post from toptal https://www.toptal.com/big-data/consistent-hashing actually explains it best (I actually understood it quite a bit from reading the data sections at the middle section of the page) - most explanations for consistent hashing only gives an abstract idea of what it is trying to accomplish but the abstract ideas is still somewhat difficult to translate to some form of implementation.\nI\u0026rsquo;ll try to give a slightly tldr version here but it may be clearer to you in code. First step to the whole consistent hashing is to set up the idea of a hashring. Like the toptal blog mentioned, we can imagine some sort of circle where we would vizualize the data and servers to be.\nIn order to vizualize the data and servers, we would need to run them through our usual hashing function. Since we\u0026rsquo;re working with a representation of circles with our hashring, it might be good to just imagine that we\u0026rsquo;re trying to compute angles where the data would be vizualized at or where our servers would be vizualized at. Let\u0026rsquo;s pretend that we\u0026rsquo;re working with 3 servers here. For data, we can use the hash the primary key that would determine which server to store the data in. In the case for servers, we can probably choose to hash server ids which we can then map onto the hash ring.\nOnce we vizualized our servers onto the hashring, we can then vizualize our data point on the hashring. We would then need to think of methodology of how to assign. The simplest seems to be us going clockwise direction, and if the hashed data point is less that the hashed server, it would be assigned to it.\nAs we hash our servers to be mapped onto the hashring, there could be the possibility that the hashed servers could all be clumped in one section of the hashring? That would make it somewhat difficult to kind of ensure that our data points is actually assigned as equally as possibility across the nodes. Seeing this, we can instead just simply increase the number of \u0026ldquo;server\u0026rdquo; points on the hashring, - the whole concept of adding more points onto this hashring is called virtual nodes - this terminology is used across the industry. You can do a check for the cassandra database that heavily relies on these set of concepts; for cassandra, they shortened the term of virtual nodes to vnodes.\nRather than us going on and on about how consistent hashing algorithm works, we can simply look at some code to see if how it performs. We shall do the same thing as our previous approach of simply using modulus - we would calculate the balance of the data across our server nodes as well as see the percentage change of data that needs to migrate to other nodes. package main import ( \u0026#34;crypto/md5\u0026#34; \u0026#34;encoding/hex\u0026#34; \u0026#34;fmt\u0026#34; \u0026#34;math/big\u0026#34; \u0026#34;sort\u0026#34; \u0026#34;strconv\u0026#34; ) var ( data []string = []string{} consistentAssignment []int = []int{} consistentReassignment []int = []int{} initialNodeCount = 3 finalNodeCount = 4 dataCount = 1000 virtualNodeMultiplier int64 = 10 ) type logicalServer struct { Node int64 Name string Angle float64 } type logicalServers []logicalServer func (l logicalServers) Len() int { return len(l) } func (l logicalServers) Less(i, j int) bool { return l[i].Angle \u0026lt; l[j].Angle } func (l logicalServers) Swap(i, j int) { l[i], l[j] = l[j], l[i] } func (l logicalServers) Sort() { sort.Sort(l) } func createLogicalServerList(virtualNodeMultiplier, nodes int64) []logicalServer { ls := []logicalServer{} for i := 0; i \u0026lt; int(nodes); i++ { for j := 0; j \u0026lt; int(virtualNodeMultiplier); j++ { nodeName := \u0026#34;node\u0026#34; + strconv.Itoa(i) + \u0026#34;-\u0026#34; + strconv.Itoa(j) ls = append(ls, logicalServer{ Node: int64(i), Name: nodeName, Angle: float64(hasher(nodeName) % 360.0), }) } } logicalServers(ls).Sort() return ls } func consistentAssign(ls []logicalServer, v int64) int64 { zz := float64(v % 360.0) initialAssign := -1 for _, k := range ls { if zz \u0026gt; k.Angle { initialAssign = int(k.Node) continue } return k.Node } return int64(initialAssign) } func dataBalancingCounter(assignments []int) map[string]int { hoho := map[string]int{} for _, v := range assignments { hoho[\u0026#34;node\u0026#34;+strconv.Itoa(v)] = hoho[\u0026#34;node\u0026#34;+strconv.Itoa(v)] + 1 } return hoho } func hasher(v string) int64 { bi := big.NewInt(0) h := md5.New() h.Write([]byte(v)) hexstr := hex.EncodeToString(h.Sum(nil)) bi.SetString(hexstr, 16) value := bi.Int64() if value \u0026lt; 0 { value = value * -1 } return value } func main() { for i := 0; i \u0026lt; dataCount; i++ { data = append(data, \u0026#34;weatherinsingaporehot\u0026#34;+strconv.Itoa(i)) } initialLogicalServerList := createLogicalServerList(virtualNodeMultiplier, int64(initialNodeCount)) finalLogicalServerList := createLogicalServerList(virtualNodeMultiplier, int64(finalNodeCount)) for _, v := range data { value := hasher(v) nodeAssign5 := consistentAssign(initialLogicalServerList, value) consistentAssignment = append(consistentAssignment, int(nodeAssign5)) nodeAssign6 := consistentAssign(finalLogicalServerList, value) consistentReassignment = append(consistentReassignment, int(nodeAssign6)) } consistentChangeRequired := 0 for i, _ := range consistentAssignment { if consistentAssignment[i] != consistentReassignment[i] { consistentChangeRequired = consistentChangeRequired + 1 } } fmt.Printf(\u0026#34;%v of the data is changed\\n\u0026#34;, float64(consistentChangeRequired)/float64(dataCount)*100) fmt.Printf(\u0026#34;split of data for consistent:\\n%v\\n\u0026#34;, dataBalancingCounter(consistentReassignment)) } The output of the following code:\n25.2 of the data is changed split of data for consistent: map[node0:311 node1:253 node2:184 node3:252] Notice the relatively big drop of percentage in % of changed data points across nodes. This is noticeable drop in a sense all thanks to the different algorithm being here. If we extend it out to 1,000,000 data points - 50% of the data not being moved around (70+% - 20+%) means about 500,000 data points not moved. These percentage affect way more in the larger scale as compared to the smaller scale.\nThe above consistent hashing implementation is not the most perfect. For our case, we only simply used a linear search to find and assign our data to a specific node on our cluster but this is definitely a case where we can rely on a binary search algorithm instead to quick skip redundant records.\nConclusion # The above is simply one small segment of the distributed systems world. Distributed systems are generally really hard to build and manage and require a team of experts to do so - so much so that even on Kubernetes, there is a concept of building applications that is designed with the aims to replicate what these experts can do. Probably in a future blog post, I will probably build out a cluster of servers (that would represent a cluster of key-value store) that would distribute data and rebalance data across it. However, that would take a long while before I can build it - there are other concepts to understand as well (note: we didn\u0026rsquo;t even talk about leader election which is a usual topic that is usually mentioned often in the distributed systems world)\n","date":"10 May 2023","externalUrl":null,"permalink":"/consistent-hashing-implementation-in-golang/","section":"Posts","summary":"In the real world, we often have to deal with such large traffic loads that it is almost necessary to know that there is possibility that we might need to get data stored in a cluster of machines. In the case if we have applications that barely need to deal and manage data, we can simply on existing products out there that can simply scale out the number of replicas of the application which it can simply serve pretty easily. However, what about applications that rely on database? We need our database server cluster to also scale out accordingly as well (there are limits to scale vertically in most cloud providers after all)\n","title":"Consistent Hashing Implementation in Golang","type":"posts"},{"content":"This is more of a reminder post for me that every aspect of application development is critical and sufficient thought should be put behind it. This time around, it\u0026rsquo;s on database migration within applications.\nAs a matter of convenience, I use a Golang ORM library called Gorm that would handle database interactions. It is definitely a convenient way to manage and handle database records. Even if people mention how bad ORMs is, at the end of the day, you would still want to manage the data coming from application in some form of struct where the types are obviously set. If this was the goal, even if you use the raw mysql golang libraries - you would still write up code that would map the raw database responses to Golang structs before it\u0026rsquo;s being passed around the application.\nHowever, this time around, this post isn\u0026rsquo;t exactly aiming to focus on the ORM portion of the library but more of the database migration bit. The GORM library comes with some functionality that helps users with database migration simply based on Golang structs. All we need to do is denote the structs that would be managing the data we wish to store in database with some gorm struct tags, and then, the said information will be used by Gorm library to create the necessary tables/columns that would be used to store the data on the database. The convenient function that is being used to handle it is the AutoMigrate function. https://gorm.io/docs/migration.html\nIn general, the AutoMigrate function works fine in most cases. At the end of the day, you would want to work with latest schema that works with your application. One of the biggest benefits of this function is that it allows you to do database migrations without needing to think of writing sql migration scripts etc - which is usally a major pain. Database migrations are usually one of the main reasons for why applications cannot be updated simply:\nThere is too much data in database and a naive database migration would result in an outage A table that the application is using needs be broken up, maybe another round of database migration needs to be done? A bad database migration due to programming error and it\u0026rsquo;s required for the database schema to be reverted Deploed application and its database schema are extremely outdated. It\u0026rsquo;s impossible to update it unless one uses old binaries and keep swapping/upgrading binaries till the right database schema is reached for updating the application to the latest version. The AutoMigrate function kind of nudges the developer in the direction to not think of such scenarios. However, reality always strikes back at the worst possible timings and it\u0026rsquo;s always better to have such capabilities in place rather than not having it at all.\nAn alternative (albeit better in my opinion) for handling database migration is to utilize something like golang-migrate? https://github.com/golang-migrate/migrate. The library provides a CLI but we can embed said functionality into our application. One of the primary capabilities it provides is a \u0026ldquo;migration\u0026rdquo; database schema table that allows us to track the version of the database schema we are on. At the same time, it also allows us to update database schema one at a time. We can jump multiple database migrations at one go or we can go one database migration at a time to ensure database will not suffer any outage.\nHere is a sample application from my many sample golang application that utilizes the go-migrate library. https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicMigrate\nFor reference on all other posts with regards to the building of the slides to video application:\nLessons from building Slides to Video App - Part 1 CORS with Golang Microservices and Elm Frontend is difficult ","date":"3 May 2023","externalUrl":null,"permalink":"/rethinking-migrations-in-golang-applications/","section":"Posts","summary":"This is more of a reminder post for me that every aspect of application development is critical and sufficient thought should be put behind it. This time around, it’s on database migration within applications.\n","title":"Rethinking migrations in Golang Applications","type":"posts"},{"content":"","date":"3 May 2023","externalUrl":null,"permalink":"/categories/slides-to-video/","section":"Article Categories","summary":"","title":"Slides-to-Video","type":"categories"},{"content":"","date":"3 May 2023","externalUrl":null,"permalink":"/tags/slides-to-video/","section":"Technology Tags","summary":"","title":"Slides-to-Video","type":"tags"},{"content":"When building login systems in applications, there are generally two parts to it; authentication and authorization. Authentication is the step to provide and identify who the user that is attempting to use the system. Authorization is the step to decide whether user that is using the system is \u0026ldquo;allowed\u0026rdquo; to access or modify a particular resource on a system.\nAn example of this in the retail sense is where a a cashier is able to utilize machines that is able to create new sales transaction records on the sales database. However, said machines shouldn\u0026rsquo;t have access nor the capability to run a query which will extract past transactions and do analysis and summaries on it. That\u0026rsquo;s a system where the \u0026ldquo;cashier\u0026rdquo; user is able to autheticated as a \u0026ldquo;cashier\u0026rdquo; on the system and is only authorized to add transactional records but has not authorization to access other forms of data or even modify proces of goods etc.\nWhen it comes to building such authorization systems in Golang applications, we can somewhat build it by coding it out. In the case where we need to set authorization controls on API endpoints, we can code out sections of code that would check if a particular user is allowed to access a particular API endpoint; e.g. admin user for an API system is able to delete off users of the system? However, naive implementations of this would generally couple/embed such authorization tightly with the code. At the same time, it is hard to go through the entire codebase to identify what kind of users would have access to which particular endpoint.\nI\u0026rsquo;m supposing this is part of the reason why things like a whole domain language is created for this. In the open policy agent project, there is a language called rego that is a domain specific language that is designed specifically for providing authorization policies for users.\nLet\u0026rsquo;s have a sample authorization for our application with the following weird requirements:\nAPI endpoint is /salary/\u0026lt;user id\u0026gt; HTTP methods when accessing this endpoint is a \u0026ldquo;GET\u0026rdquo; API endpoint is only allowed for users who are still \u0026ldquo;subscribed\u0026rdquo; - this is denoted by some sort of \u0026ldquo;expiry_year\u0026rdquo; field that denotes the ending year for their subscription. Naturally, at the same, admin should also have access to this resource without any issue.\nThe authorization policy for this would probably look something like this if written using rego. Reference with regards to the rego language: https://www.openpolicyagent.org/docs/latest/policy-language/\npackage example.authz import future.keywords.if import future.keywords.in default allow := false allow if { input.method == \u0026#34;GET\u0026#34; input.path == [\u0026#34;salary\u0026#34;, input.subject.user] input.expiry_year \u0026gt;= 2020 } is_admin if input.subject.user == \u0026#34;admin\u0026#34; allow if is_admin Here is some sample golang code that utilizes the above policy:\npackage main import ( \u0026#34;context\u0026#34; \u0026#34;fmt\u0026#34; \u0026#34;github.com/open-policy-agent/opa/rego\u0026#34; ) func main() { ctx := context.TODO() query, err := rego.New( rego.Query(\u0026#34;aa = data.example.authz.allow\u0026#34;), rego.Load([]string{\u0026#34;./example.rego\u0026#34;}, nil), // rego.Module(\u0026#34;example.rego\u0026#34;, module), ).PrepareForEval(ctx) if err != nil { panic(\u0026#34;damn\u0026#34;) } input := map[string]interface{}{ \u0026#34;method\u0026#34;: \u0026#34;GET\u0026#34;, \u0026#34;path\u0026#34;: []interface{}{\u0026#34;salary\u0026#34;, \u0026#34;bob\u0026#34;}, \u0026#34;subject\u0026#34;: map[string]interface{}{ \u0026#34;user\u0026#34;: \u0026#34;bob\u0026#34;, \u0026#34;groups\u0026#34;: []interface{}{\u0026#34;sales\u0026#34;, \u0026#34;marketing\u0026#34;}, }, \u0026#34;expiry_year\u0026#34;: 2050, } results, err := query.Eval(ctx, rego.EvalInput(input)) fmt.Printf(\u0026#34;%+v\\n\u0026#34;, results) } The output for the above code would like the following:\n[{Expressions:[true] Bindings:map[aa:true]}] If we changed the expiry_year to 2000, we should see the aa value within the bindings map to be false.\nDo note of how we set up the input variable in Golang code. Initially, I thought that data to be tested and evaluated for can only be written using Golang maps that uses interface{}. However, it\u0026rsquo;s possible to use structs as well (in general, this would be preference - map with string as keys and interface as values is not the most pleasant to work with). Important thing to work with structs is that we need to define json struct tags (else, it won\u0026rsquo;t as expected)\npackage main import ( \u0026#34;context\u0026#34; \u0026#34;fmt\u0026#34; \u0026#34;github.com/open-policy-agent/opa/rego\u0026#34; ) func main() { ctx := context.TODO() query, err := rego.New( rego.Query(\u0026#34;aa = data.example.authz.allow\u0026#34;), rego.Load([]string{\u0026#34;./example.rego\u0026#34;}, nil), // rego.Module(\u0026#34;example.rego\u0026#34;, module), ).PrepareForEval(ctx) if err != nil { panic(\u0026#34;damn\u0026#34;) } input := map[string]interface{}{ \u0026#34;method\u0026#34;: \u0026#34;GET\u0026#34;, \u0026#34;path\u0026#34;: []interface{}{\u0026#34;salary\u0026#34;, \u0026#34;bob\u0026#34;}, \u0026#34;subject\u0026#34;: map[string]interface{}{ \u0026#34;user\u0026#34;: \u0026#34;bob\u0026#34;, \u0026#34;groups\u0026#34;: []interface{}{\u0026#34;sales\u0026#34;, \u0026#34;marketing\u0026#34;}, }, \u0026#34;expiry_year\u0026#34;: 2050, } type hehe struct { User string `json:\u0026#34;user\u0026#34;` Groups []string `json:\u0026#34;groups\u0026#34;` } type hoho struct { Subject hehe `json:\u0026#34;subject\u0026#34;` } zz := hoho{ Subject: hehe{ User: \u0026#34;admin\u0026#34;, Groups: []string{\u0026#34;testing\u0026#34;}, }, } results, err := query.Eval(ctx, rego.EvalInput(input)) fmt.Printf(\u0026#34;%+v\\n\u0026#34;, results) results, err = query.Eval(ctx, rego.EvalInput(zz)) fmt.Printf(\u0026#34;%+v\\n\u0026#34;, results) } Probably in some next blog post, I will cover a more indepth example by embeding rego in some Golang HTTP server.\n","date":"26 April 2023","externalUrl":null,"permalink":"/writing-rego-policies-for-authorization-in-golang-apps/","section":"Posts","summary":"When building login systems in applications, there are generally two parts to it; authentication and authorization. Authentication is the step to provide and identify who the user that is attempting to use the system. Authorization is the step to decide whether user that is using the system is “allowed” to access or modify a particular resource on a system.\n","title":"Writing Rego Policies for authorization in Golang Apps","type":"posts"},{"content":"I hate Youtube Shorts with a passion. Youtube shorts are a plague in my ways and it seems to be that it\u0026rsquo;s main purpose is to drag me down to waste hours of my time watching stupid short clips that are usually only mildly amusing. And at the end of it all, I don\u0026rsquo;t feel satisfied or feel entertained after wasting hours on it. (Maybe it\u0026rsquo;s just my age catching up to me and myself going with the usual trend of old people hating the new hype thing)\nHowever, regardless of what one thinks of it, it would be nice to somehow get rid of said videos from even being viewed on my browser (technically, I use youtube the most on my own computer, so it makes sense to start from there). I never really had the drive to this until a recent change on the youtube website happened: you can no longer tell the website that of removing the youtube shorts shelve for a month. It is now a permanent feature and you\u0026rsquo;re forced to view it no matter how you feel about it. I guess that\u0026rsquo;s the final straw for me to try to find a way to build something for this (it\u0026rsquo;s a learning opportunity as well\u0026hellip;)\nI guess to start learning how to build a chrome extension, it would be best to start from some of quickstart example. Luckily, the chrome extension page does have something, so we can simply copy and paste some code to get something working. https://developer.chrome.com/docs/extensions/mv3/getstarted/development-basics/\nAfter getting an extension into the browser, the next step is to see how to run some form of javascript to do the required magic of removing the trashy content from the webpage. The following page provides a good guide to get started on where to write the javascript that would do the page manipulation: https://developer.chrome.com/docs/extensions/mv3/getstarted/tut-reading-time/\nAfter some trial and error, I finally got some hacky javascript into the chrome extension and got it working. The following piece of javascript is able to do the following: Remove youtube shorts shelves as well as remove any video content that points to a youtube short.\nfunction listener() { // Remove all youtube shorts shelves aa = document.querySelectorAll(\u0026#39;ytd-reel-shelf-renderer\u0026#39;); aa.forEach((a) =\u0026gt; {a.remove()}); console.info(\u0026#34;deleted youtube shorts shelves\u0026#34;); // Remove all youtube shorts video bb = document.querySelectorAll(\u0026#39;#video-title\u0026#39;); bb.forEach(item =\u0026gt; { if (item.getAttribute(\u0026#39;href\u0026#39;) == null) { return; } if (item.href.includes(\u0026#39;https://www.youtube.com/shorts\u0026#39;)) { item.closest(\u0026#39;ytd-video-renderer\u0026#39;).remove(); } console.info(\u0026#39;deleted youtube shorts video\u0026#39;); }); } var timeout = null; document.addEventListener(\u0026#34;DOMSubtreeModified\u0026#34;, function() { if(timeout) { clearTimeout(timeout); } timeout = setTimeout(listener, 500); }, false); There are a few things about the javascript code above; one is that we need to add the code to add event listener etc because the youtube website doesn\u0026rsquo;t load all of the content at one go. It actually pulls more html/js content as you scroll down through the webpage which it results in further rendering. A naive attempt to simply remove \u0026ldquo;offensive\u0026rdquo; html elements once page is loaded is insufficient. We would need to keep checking node content every once in a while and clean out in some sort of loop.\nAnother important thing about this Javascript code is its extremely inefficient - and its impact shows. The moment a youtube page is scrolled through, CPU usage climbs rather quickly - so the implementation here is probably not for the best. It\u0026rsquo;s best to revisit it once more in the future if the performance issues continue to plague me for it.\nI will probably continue to add more features to it such as remove an entire class of content (e.g. reaction videos) as well as videos that are under 10s. But that\u0026rsquo;ll be for another post.\n","date":"19 April 2023","externalUrl":null,"permalink":"/chrome-extension-to-get-rid-of-youtube-shorts/","section":"Posts","summary":"I hate Youtube Shorts with a passion. Youtube shorts are a plague in my ways and it seems to be that it’s main purpose is to drag me down to waste hours of my time watching stupid short clips that are usually only mildly amusing. And at the end of it all, I don’t feel satisfied or feel entertained after wasting hours on it. (Maybe it’s just my age catching up to me and myself going with the usual trend of old people hating the new hype thing)\n","title":"Chrome Extension to get rid of Youtube Shorts","type":"posts"},{"content":"","date":"19 April 2023","externalUrl":null,"permalink":"/categories/personal/","section":"Article Categories","summary":"","title":"Personal","type":"categories"},{"content":"","date":"19 April 2023","externalUrl":null,"permalink":"/tags/personal/","section":"Technology Tags","summary":"","title":"Personal","type":"tags"},{"content":"I have a little side project at work where it somewhat requires me to allow a pod within a kubernetes cluster to access and query and manipulate resources in a Kubernetes cluster. This would provide some sort of special development environment within pod with the required capability to update the cluster. In order to do this, we need to add a bunch of roles, clusterroles and its bindings (essentially the RBAC system in Kubernetes) to allow the pod to access said resources\nImportant thing to note here is to NEVER RUN THIS ON PRODUCTION ENVIRONMENTS. The following configurations provides unnecessary power into a single pod - if there was ever someone who managed to get into that specific pod, the person would be able to wreck on the cluster. But then again, if someone already has capability to access and enter a pod in a cluster, you would have other more critical security concerns to address.\nFirst, let\u0026rsquo;s set up a simple pod to how we would be unable to utilize the kubectl command effectively by default. Let\u0026rsquo;s first create some sort of Kubernetes cluster - in my case, I created mine in Google Kubernetes Engine. Once, we have the cluster up and running, we can create a deployment resource which would create our pod. This can be done via the following command.\nkubectl create deployment lol --image=nginx Let\u0026rsquo;s then query the pods being created and then enter the bash of said pod.\nkubectl get pods kubectl exec -it \u0026lt;pod name\u0026gt; -- /bin/bash The next step is to get the kubectl command into the container. The nginx image we use is convenient as it provides a running container with a command that allows it to run as a server. Other images such as debian and ubuntu would require us to provide some sort of \u0026ldquo;sleep\u0026rdquo; command to make it run for a longer period. For reference to install the kubectl command, we can refer to the following website:\nhttps://kubernetes.io/docs/tasks/tools/install-kubectl-linux/\nFor our case, we can simply run the following commands within the pod.\ncurl -LO \u0026#34;https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl\u0026#34; mv kubectl /usr/local/bin/kubectl chmod +x /usr/local/bin/kubectl kubectl If we tried to list the pods with the following command:\nkubectl get pods We\u0026rsquo;ll get the following output.\nError from server (Forbidden): pods is forbidden: User \u0026#34;system:serviceaccount:default:default\u0026#34; cannot list resource \u0026#34;pods\u0026#34; in API group \u0026#34;\u0026#34; in the namespace \u0026#34;default\u0026#34; This is to be expected. By default, most pods don\u0026rsquo;t need access to Kubernetes resources. E.g An application that is serving some sort of business logic shouldn\u0026rsquo;t need to query any Kubernetes resources etc.\nNow, let\u0026rsquo;s remedy the situation to solve for my need to be able to access all the Kubernetes resources from a pod. First, let\u0026rsquo;s create a ServiceAccount resource. This would ensure that most pods wouldn\u0026rsquo;t get the special access to the Kubernetes API. If a pod is created without pointing to a specific ServiceAccount, it would default to the default service account which should still be pretty locked on at this stage. We\u0026rsquo;re giving the ServiceAccount the name god since essentially, that\u0026rsquo;s what it can kind of do - read and manipulating anything on the cluster literally. However, feel free the alter the example accordingly.\napiVersion: v1 kind: ServiceAccount metadata: name: god namespace: default The next step is to provide the role and rolebindings that we would give. The Role and Rolebindings would give us permissions to view resources in the default namespace. We give it capability to view most resources within said namespace (although we might still need to add on to the role if we need access to other types of resources)\napiVersion: v1 kind: Role metadata: name: god-role namespace: default rules: - apiGroups: - \u0026#34;\u0026#34; resources: - \u0026#39;*\u0026#39; verbs: - \u0026#39;*\u0026#39; The next yaml is to bind above roles to a serviceaccount which in our case, would be the god ServiceAccount.\napiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: god-role-binding namespace: default roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: god-role subjects: - kind: ServiceAccount name: god namespace: default Just having the above Role and RoleBinding isn\u0026rsquo;t enough. If we only have that, we can run commands such as this: kubectl get pods --all-namespaces. That is a \u0026ldquo;cluster\u0026rdquo; level operation and we need the appropriate ClusterRole to be provided to the pod to be able to do give said pod the access needed.\napiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: god-role rules: - apiGroups: - \u0026#34;\u0026#34; resources: - \u0026#39;*\u0026#39; verbs: - \u0026#39;*\u0026#39; This would be the binding configuration for the above ClusterRole\napiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: god-role-binding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: god-role subjects: - kind: ServiceAccount name: god namespace: default Now that we have the RBAC permissions all setup, we can finally alter our pod to provide it the permission access it needs to do the magic. To do so, we\u0026rsquo;ll need to edit our deployment resource.\nkubectl edit deployment lol The only 2 lines we need to add is the serviceAccount and serviceAccountName. Do note that the following yaml below is heavily abbreviated - it\u0026rsquo;s only trying to demonstrate where to add the 2 key values pairs so that the pod would start to use the god ServiceAccount.\napiVersion: apps/v1 kind: Deployment metadata: name: lol ... spec: template: metadata: labels: app: lol spec: serviceAccount: god # Important lines to add serviceAccountName: god # Important lines to add ... Once we have done that, we can then exec into the pod once more and then install kubectl and then run the following set of commands to test it out. It should work properly.\nkubectl get pods kubectl get pods --all-namespace I\u0026rsquo;ll probably provide more context on the side project that I\u0026rsquo;m doing in the future on why is this is needed but for now, I\u0026rsquo;ll leave this as it is. Till next time\u0026hellip;\n","date":"12 April 2023","externalUrl":null,"permalink":"/running-kubectl-in-a-kubernetes-pod/","section":"Posts","summary":"I have a little side project at work where it somewhat requires me to allow a pod within a kubernetes cluster to access and query and manipulate resources in a Kubernetes cluster. This would provide some sort of special development environment within pod with the required capability to update the cluster. In order to do this, we need to add a bunch of roles, clusterroles and its bindings (essentially the RBAC system in Kubernetes) to allow the pod to access said resources\n","title":"Running kubectl in a Kubernetes Pod","type":"posts"},{"content":"I was watching a bunch of tiktok and youtube videos recently and kind of started to wonder how such companies serve videos to their consumers. That is where I started to going down the rabbit hole of how videos are served and how to try to ensure the possibility that videos can be played without requiring to download the entire video.\nApparently, one of the technologies that was mentioned for streaming videos from server to a consumer device was HLS. HLS stands for HTTP Live Streaming. Although we\u0026rsquo;re not exactly doing \u0026ldquo;live streaming\u0026rdquo; if we\u0026rsquo;re just attempting to serve video - however, if we were to think twice about it, we\u0026rsquo;re kind of \u0026ldquo;streaming\u0026rdquo; the contents of the video to the user. We would want the user to be able to consume the content even before the entire video is loaded.\nTo do this HLS thing, we can rely on the usual video manipulation tool - ffmpeg. In order to get the HLS form of the video, we would need to re-encode the video to the HLS format; this blog post will mention an example command that can be used for this. The HLS format consists of a single file that would provide a list of all files that would point to the various video segments of the entire video. The reason for breaking this up the file into smaller video segments is to allow the consumer device to download a smaller piece of content and start playing without requiring to download the entire video. Downloading large files over the internet is usually not the best thing for a app/websites - smaller files usually work way better; if there is any broken connections, it would still be possible to restart downloads of small video chunks. At the same time, with the HLS format, we would be able to download maybe a couple of video chunks and immediately start playing the video.\nThis blog post won\u0026rsquo;t cover on how one can obtain ffmpeg on their workstation. But if you\u0026rsquo;re on a Mac, you can probably get it by utilizing brew.\nThe next step would be utilize the ffmpeg tool to convert the target video which we wish to serve to a \u0026ldquo;HLS\u0026rdquo; format. The HLS format of video consist of a m3u8 file which serves to be some sort of \u0026ldquo;manifest\u0026rdquo; file. This is the \u0026ldquo;single file\u0026rdquo; that point to the various video segments. This is the command that would help to do so:\nffmpeg -i sample.mkv -c:a copy -f hls -hls_playlist_type vod output.m3u8 Let\u0026rsquo;s cover the effects of some of the flags being used above:\n-i refers to the input file. The input above is sample.mkv -c:a refers to the step to copy the audio over to the encoded file -f refers to the format hls_playlist_type is one of the options specifically for encoding videos for the hls format. This one is partically needed in order to ensure all video segments are added to the m3u8 file. The default is the last 5 entries for the video (so it\u0026rsquo;ll only play the end of the videos. Reference: https://stackoverflow.com/questions/65069045/only-last-four-entries-of-ts-files-found-in-out-m3u8-file-when-i-am-using-ffmpe) Once we have done with running the command - we would generate the files. However, how shall we test that the video has been encoded to the hls format. We can do so by utilizing some common video player (in my case, I usually use VLC media player) and point it to some server that would be a file server to serve the m3u8 file as well as the ts video segment files.\nThe golang code to do so is available here:\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;log\u0026#34; \u0026#34;net/http\u0026#34; ) func main() { http.Handle(\u0026#34;/\u0026#34;, http.FileServer(http.Dir(\u0026#34;.\u0026#34;))) fmt.Printf(\u0026#34;Starting server on %v\\n\u0026#34;, 8080) log.Printf(\u0026#34;Serving %s on HTTP port: %v\\n\u0026#34;, \u0026#34;.\u0026#34;, 8080) // serve and log errors log.Fatal(http.ListenAndServe(fmt.Sprintf(\u0026#34;:%v\u0026#34;, 8080), nil)) } For some guides on how to play m3u8 based videos, refer to the following reference: https://www.5kplayer.com/vlc/m3u8-vlc.htm\nThe next step after this is to try to render the video on a html page served by the Golang server. Unfortunately, a naive approach of using the \u0026lt;video\u0026gt; html5 tag doesn\u0026rsquo;t work in Chrome browsers. Surprisingly, HLS is not natively supported in Chrome browsers - there are formats that are properly supported but a deeper dive is needed to find out how those work and how we can utilize ffmpeg to generate said video streams. We have to use javascript based solutions to provide said functionality on Chrome.\nOne might argue - why not just develop for Chrome? Unfortunately, the Chrome browser is still one of the more popular browsers in common use. At the same, other browsers such as the edge browser are based of the Chromium project - it is pretty safe to assume that if chrome doesn\u0026rsquo;t support HLS formats natively, then, other chromium based browsers wouldn\u0026rsquo;t provide such support as well.\nFor the following quick example, I decided to go with video.js as it is one of the libraries that provide a \u0026ldquo;working\u0026rdquo; example (also, quite a number of reputable video based companies are listed on its website.). With that, we can then include serving of a html page with video.js javascript functionality.\nThis would be the Golang server code:\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;html/template\u0026#34; \u0026#34;log\u0026#34; \u0026#34;net/http\u0026#34; ) type VideoServe struct { } func (h VideoServe) ServeHTTP(w http.ResponseWriter, r *http.Request) { t, _ := template.ParseFiles(\u0026#34;aaa.html\u0026#34;) t.Execute(w, nil) } func main() { http.Handle(\u0026#34;/yoyo\u0026#34;, VideoServe{}) http.Handle(\u0026#34;/\u0026#34;, http.FileServer(http.Dir(\u0026#34;.\u0026#34;))) fmt.Printf(\u0026#34;Starting server on %v\\n\u0026#34;, 8080) log.Printf(\u0026#34;Serving %s on HTTP port: %v\\n\u0026#34;, \u0026#34;.\u0026#34;, 8080) // serve and log errors log.Fatal(http.ListenAndServe(fmt.Sprintf(\u0026#34;:%v\u0026#34;, 8080), nil)) } For the html page named aaa.html that would be served by the Golang server coded above.\n\u0026lt;html\u0026gt; \u0026lt;head\u0026gt; \u0026lt;title\u0026gt;Hls.js demo - basic usage\u0026lt;/title\u0026gt; \u0026lt;script src=\u0026#34;//cdn.jsdelivr.net/npm/hls.js@latest\u0026#34;\u0026gt;\u0026lt;/script\u0026gt; \u0026lt;/head\u0026gt; \u0026lt;body\u0026gt; \u0026lt;center\u0026gt; \u0026lt;h1\u0026gt;Hls.js demo - basic usage\u0026lt;/h1\u0026gt; \u0026lt;video height=\u0026#34;600\u0026#34; id=\u0026#34;video\u0026#34; controls\u0026gt;\u0026lt;/video\u0026gt; \u0026lt;/center\u0026gt; \u0026lt;script\u0026gt; var video = document.getElementById(\u0026#39;video\u0026#39;); if (Hls.isSupported()) { var hls = new Hls({ debug: true, }); // hls.loadSource(\u0026#39;https://test-streams.mux.dev/x36xhzz/x36xhzz.m3u8\u0026#39;); hls.loadSource(\u0026#39;http://localhost:8080/output.m3u8\u0026#39;); hls.attachMedia(video); hls.on(Hls.Events.MEDIA_ATTACHED, function () { video.muted = true; video.play(); }); } // hls.js is not supported on platforms that do not have Media Source Extensions (MSE) enabled. // When the browser has built-in HLS support (check using `canPlayType`), we can provide an HLS manifest (i.e. .m3u8 URL) directly to the video element through the `src` property. // This is using the built-in support of the plain video element, without using hls.js. else if (video.canPlayType(\u0026#39;application/vnd.apple.mpegurl\u0026#39;)) { // video.src = \u0026#39;https://test-streams.mux.dev/x36xhzz/x36xhzz.m3u8\u0026#39;; video.src = \u0026#39;http://localhost:8080/output.m3u8\u0026#39;; video.addEventListener(\u0026#39;canplay\u0026#39;, function () { video.play(); }); } \u0026lt;/script\u0026gt; \u0026lt;/body\u0026gt; \u0026lt;/html\u0026gt; With that, we should be able to stream some sort of video from our Golang server to our browser. However, we would probably need to research further into it; there are many other things that we would need to take note and attempt to fix for: one of which would be to do the encoding ffmpeg job on demand - the example above requires us to encode an other video and have the transcoded videos available before the user accesses it. That would require unnecessary space on our part (especially if storage space is tight in the first place)\nI\u0026rsquo;ll probably look to cover some of these in another blog post.\nReferences # https://hlsjs.video-dev.org/demo/ https://developers.cloudflare.com/stream/examples/hls-js/ http://underpop.online.fr/f/ffmpeg/help/options-51.htm.gz https://ottverse.com/hls-packaging-using-ffmpeg-live-vod/ https://ffmpeg.org/ffmpeg-formats.html#Options-10 https://ffmpeg.org/ffmpeg-filters.html#subtitles https://superuser.com/questions/996149/how-do-i-map-vf-subtitles-with-ffmpeg https://medium.com/bootdotdev/create-a-golang-video-streaming-server-using-hls-a-tutorial-f8c7d4545a0f https://www.baeldung.com/linux/subtitles-ffmpeg https://trac.ffmpeg.org/wiki/HowToBurnSubtitlesIntoVideo https://ffmpeg.org/ffmpeg-formats.html#hls-2 https://stackoverflow.com/questions/19782389/playing-m3u8-files-with-html-video-tag ","date":"5 April 2023","externalUrl":null,"permalink":"/serving-videos-with-golang-via-hls/","section":"Posts","summary":"I was watching a bunch of tiktok and youtube videos recently and kind of started to wonder how such companies serve videos to their consumers. That is where I started to going down the rabbit hole of how videos are served and how to try to ensure the possibility that videos can be played without requiring to download the entire video.\n","title":"Serving Videos with Golang via HLS","type":"posts"},{"content":"When dealing with applications - in terms of configuration work or even deploying the application to production, there is high possibility that we would need to deal with plenty of yaml. Yaml is a somewhat popular markup language (as of now) to do configuration work - other types of markup language/tools that are available and also used are ini files, toml files and json files but we won\u0026rsquo;t be focusing on those for this post.\nWhen writing yaml, it could be a pain to ensure that the yaml is a proper structure. We could try to ensure that it\u0026rsquo;s \u0026ldquo;proper\u0026rdquo; by writing a quick script to parse and then ensure that the structure is right but that would involve us needed to run the script constantly to ensure that yaml is correct as we edit it (to avoid huge errors that would require to edit huge swaths of yaml code). Fortunately for us, the tech industry came up with a common-ish solution to try to solve this and modern IDEs have taken up the solution that would help developers to do linting as well auto completion of yaml.\nIn order to have the capability to do linting/auto completion for yaml in visual studio code - we would first need a plugin that is able to do so - https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml. You can access it simply by finding plugins that support yaml files in the plugin search tool within Visual Studio Code.\nOnce we have that in place, we can then begin to craft out the file that would provide rules that would be used to ensure our yaml is in the right structure.\nLet\u0026rsquo;s say the yaml file that we intend to do linting on is called zzz.yaml. We can ensure that linting is turned on by adding the following file in the .vscode folder. This config file serves as settings to tell how visual studio should behave when viewing the code within the tool. In our case, we want to inform Visual Studio code to refer to the following file to do linting on zzz.yaml.\nOur schema file as address.schema.json.\n{ \u0026#34;$id\u0026#34;: \u0026#34;https://example.com/example.schema.json\u0026#34;, \u0026#34;$schema\u0026#34;: \u0026#34;http://json-schema.org/draft-07/schema#\u0026#34;, \u0026#34;title\u0026#34;: \u0026#34;Example\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;An example of how to use snippets for json schema\u0026#34;, \u0026#34;type\u0026#34;: \u0026#34;object\u0026#34;, \u0026#34;properties\u0026#34;: { \u0026#34;example-snippet\u0026#34;: { \u0026#34;defaultSnippets\u0026#34;: [ { \u0026#34;label\u0026#34;: \u0026#34;foo\u0026#34;, \u0026#34;body\u0026#34;: {\u0026#34;test\u0026#34;: \u0026#34;test\u0026#34;, \u0026#34;test2\u0026#34;: \u0026#34;test2\u0026#34;, \u0026#34;test3\u0026#34;: {\u0026#34;test3\u0026#34;: \u0026#34;acac\u0026#34;}} } ] }, \u0026#34;object-mapper\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;object\u0026#34;, \u0026#34;patternProperties\u0026#34;: { \u0026#34;[A-Za-z0-9]\u0026#34;: { \u0026#34;required\u0026#34;: [\u0026#34;test1\u0026#34;, \u0026#34;test2\u0026#34;], \u0026#34;defaultSnippets\u0026#34;: [ { \u0026#34;label\u0026#34;: \u0026#34;deploy\u0026#34;, \u0026#34;body\u0026#34;: {\u0026#34;test1\u0026#34;: \u0026#34;acac\u0026#34;, \u0026#34;test2\u0026#34;: \u0026#34;acac\u0026#34;} } ], \u0026#34;properties\u0026#34;: { \u0026#34;test1\u0026#34;: {\u0026#34;type\u0026#34;: \u0026#34;string\u0026#34;}, \u0026#34;test2\u0026#34;: {\u0026#34;type\u0026#34;: \u0026#34;string\u0026#34;} } } } }, \u0026#34;item-mapper\u0026#34;: { \u0026#34;minItems\u0026#34;: 1, \u0026#34;maxItems\u0026#34;: 5, \u0026#34;type\u0026#34;: \u0026#34;array\u0026#34;, \u0026#34;defaultSnippets\u0026#34;: [ { \u0026#34;label\u0026#34;: \u0026#34;aaa\u0026#34;, \u0026#34;body\u0026#34;: {\u0026#34;test1\u0026#34;: \u0026#34;acac\u0026#34;, \u0026#34;test2\u0026#34;: \u0026#34;acac\u0026#34;} } ], \u0026#34;items\u0026#34;: { \u0026#34;required\u0026#34;: [\u0026#34;test1\u0026#34;, \u0026#34;test2\u0026#34;], \u0026#34;properties\u0026#34;: { \u0026#34;test1\u0026#34;: {\u0026#34;type\u0026#34;: \u0026#34;string\u0026#34;}, \u0026#34;test2\u0026#34;: {\u0026#34;type\u0026#34;: \u0026#34;string\u0026#34;} } } }, \u0026#34;address\u0026#34;: { \u0026#34;title\u0026#34;: \u0026#34;Street Address\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;Street of your address\u0026#34;, \u0026#34;type\u0026#34;: \u0026#34;string\u0026#34; } } } For our Visual Studio Code settings in .vscode/settings.json within the workspace\n{ \u0026#34;yaml.schemas\u0026#34;: { \u0026#34;address.schema.json\u0026#34;: [ \u0026#34;zzz.yaml\u0026#34; ] } } Once we have in this place, we now are able to do a little magic within zzz.yaml.\nLet\u0026rsquo;s say we want to add a field for example-snippet. Maybe it have a bunch of labels and values that we would need to create and it might be a tad troublesome to keep doing it in a sense. However, with the json schema configuration in place, we would see the following:\nVisual Studio Code would prompt that there is a possibility to auto complete that chunk for the example-snippet property. If we would simply press tab, it would immediately fill out the information (as based of the \u0026ldquo;body\u0026rdquo; field). It would pop in the values like so:\nThis behaviour is based on just the following portion from the json schema\n\u0026#34;example-snippet\u0026#34;: { \u0026#34;defaultSnippets\u0026#34;: [ { \u0026#34;label\u0026#34;: \u0026#34;foo\u0026#34;, \u0026#34;body\u0026#34;: {\u0026#34;test\u0026#34;: \u0026#34;test\u0026#34;, \u0026#34;test2\u0026#34;: \u0026#34;test2\u0026#34;, \u0026#34;test3\u0026#34;: {\u0026#34;test3\u0026#34;: \u0026#34;acac\u0026#34;}} } ] } We can do even more complex scenarios here. Another example would be to ensure that certain fields exists within our yaml object. Refer to the following section of our json schema.\n\u0026#34;object-mapper\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;object\u0026#34;, \u0026#34;patternProperties\u0026#34;: { \u0026#34;[A-Za-z0-9]\u0026#34;: { \u0026#34;required\u0026#34;: [\u0026#34;test1\u0026#34;, \u0026#34;test2\u0026#34;], \u0026#34;defaultSnippets\u0026#34;: [ { \u0026#34;label\u0026#34;: \u0026#34;deploy\u0026#34;, \u0026#34;body\u0026#34;: {\u0026#34;test1\u0026#34;: \u0026#34;acac\u0026#34;, \u0026#34;test2\u0026#34;: \u0026#34;acac\u0026#34;} } ], \u0026#34;properties\u0026#34;: { \u0026#34;test1\u0026#34;: {\u0026#34;type\u0026#34;: \u0026#34;string\u0026#34;}, \u0026#34;test2\u0026#34;: {\u0026#34;type\u0026#34;: \u0026#34;string\u0026#34;} } } } } In this case, if we have the object-mapper field in our yaml, each object within it should have the field test1 and test2. If it doesn\u0026rsquo;t, Visual Studio Code will somewhat complain about the lack of those fields. Let\u0026rsquo;s say if we set our yaml file that uses the json schema as follows:\nexample-snippet: test: test test2: test2 test3: test3: acac object-mapper: aa: test1: acac test2: acac bb: test3: acac The bb field under object-mapper is unexpected since we expect each object in object-mapper to have fields test1 and test2. The error would probably look like the following:\nWe will cover more complex scenarios in another blog post. More examples about json schema is available from the following website: https://www.schemastore.org/json/. It seems kind of interesting to see the various complex scenarios that can be covered - it does look like it\u0026rsquo;s possible to write up json schema files where we have certain fields that become required if other fields within the yaml exist.\n","date":"31 March 2023","externalUrl":null,"permalink":"/yaml-linting-and-auto-completion-in-visual-studio-code/","section":"Posts","summary":"When dealing with applications - in terms of configuration work or even deploying the application to production, there is high possibility that we would need to deal with plenty of yaml. Yaml is a somewhat popular markup language (as of now) to do configuration work - other types of markup language/tools that are available and also used are ini files, toml files and json files but we won’t be focusing on those for this post.\n","title":"Yaml linting and auto completion in Visual Studio Code","type":"posts"},{"content":"For many people, cooking is not just a means of sustenance but a beloved hobby and a way to express creativity in the kitchen. However, one of the biggest challenges for home cooks is keeping track of their recipes and possibly the list of interesting recipes from other people. In my opinion, it\u0026rsquo;s general a good idea to have a copy of such information on hand (since websites/videos hosting such recipes can eventually disappear). However, recording such information in plain text might be a tad \u0026ldquo;boring\u0026rdquo; - it\u0026rsquo;s also harder to kind of parse as well as process further. In this blog post, we will explore using cooklang as a possible tool to \u0026ldquo;standardize\u0026rdquo; such information.\nCooklang is a pretty interesting project where it attempts to collect information about recipes and provide a way for computers/scripts to understand and process the information further. When it comes to recipes, there are several important pieces of information that one would need to take note to ensure that one would be able to reproduce piece of food.\nIngredients and their respective amounts Amount of servings that recipe would be producing Steps for the cook to follow in order to reproduce the meal Equipment that might be needed to produce the meal Timings for how long a step would need to be taken (e.g. boil a piece of potato for 10 minutes etc) It might be convenient to simply just put the information for a recipe to plain old text and have that available on the internet - however, that would make it to utilize and create some sort of web service that would be able to parse and process the information to make it more useful. One possible useful feature that would be nice to have after viewing a recipe would be automatically adding the list of ingredients that we would need to produce the food to be added to some sort of shopping list. In plain text, it might prove a little too troublesome but lucky for us, cooklang has some sort of standardization in place to allow parsing of recipes to be possible that would then allow us to extract even more information to make it useful.\nAn example recipe written with cooklang would look like the following:\n\u0026gt;\u0026gt; tags: american,breakfast \u0026gt;\u0026gt; servings: 1 Place @bacon{2%slices} in a #large skillet{}. Cook over medium heat until browned. Drain, crumble, and set aside. In a #stock pot{}, melt @margarine{1/9%cup} over medium heat. Whisk in @flour{1/9%cup} until smooth. Gradually stir in @milk{7/6%cup}, whisking constant until thickened. Stir in @large baked potatoes{2/3} and @green onions{2/3}. Bring to a boil, stirring frequently. Reduce heat to low, and simmer for ~{10%minutes}. Mix in bacon, @shredded cheddar cheese{1/5%cup}, and @sour cream{1/6%cup}. Then add @salt, and @pepper to taste. Continue cooking, stirring frequently until cheese is melted. Notice the various symbols that is dotted across the entire recipe:\n\u0026ldquo;@\u0026rdquo; symbol would indicate an ingredient \u0026ldquo;#\u0026rdquo; symbol would indicate an equipment for the recipe The curly braces \u0026ldquo;{}\u0026rdquo; after an ingredient/equipment is used to denote how many of the ingredient/equipment is neeeded. \u0026ldquo;\u0026raquo;\u0026rdquo; symbol serves to denote metadat to be associated with the recipe. We can use to set \u0026ldquo;tags\u0026rdquo; for which we can associate the cuisine of the food which we can then used to do filtering of recipes etc For full specification of recipes written with cooklang, refer to the following page for it:\nhttps://cooklang.org/docs/spec/\nTechnically, we can use the cli tool that is provided to parse our recipe but fortunately, someone took the effort to create a Golang parse that is able to parse recipes written with cooklang. Github repo: https://github.com/justintout/cooklang-go. We can simply utilize this library and then create our code that would be able to understand our recipe.\nLet\u0026rsquo;s say we would want to create a piece of code that would be able to get the ingredients that a recipe needs. We can possibly use the output of that code to pass it to some sort of application that would serve as a shopping list.\nAnother piece of information that might be good to tease out would be \u0026ldquo;tag\u0026rdquo; information which might allow us to use it as a way to filter recipes. Imagine a scenario where we somehow store information on 1000s of recipes - it might be hard to remember recipes by name. Hence, we can use tags to find our recipe more easily.\nThis blog post will only focus on extracting such information from a recipe only though. The ideas mentioned in the above 2 paragraphs might be covered in another post probably.\nHere is the golang code to extract the ingredients as well as to parse tags as comma separated strings\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;strings\u0026#34; cooklang \u0026#34;github.com/justintout/cooklang-go\u0026#34; ) func main() { fmt.Println(\u0026#34;Begin golang code\u0026#34;) zzz := cooklang.MustParseFile(\u0026#34;zzz.cook\u0026#34;) // fmt.Printf(\u0026#34;Test %+v\\n\u0026#34;, zzz) // fmt.Printf(\u0026#34;%+v\u0026#34;, zzz.Ingredients) for i, _ := range zzz.Ingredients { fmt.Println(i) } for j, k := range zzz.Metadata { if j == \u0026#34;tags\u0026#34; { zz := fmt.Sprintf(\u0026#34;%v\u0026#34;, j) for _, a := range strings.Split(k, \u0026#34;,\u0026#34;) { zz = fmt.Sprintf(\u0026#34;%v == %v\u0026#34;, zz, a) } fmt.Println(zz) } } } I will continue experimenting with this and will probably use this for my own use - there are probably a few things that need to taken note of while using cooklang but that will probably be covered in the next blog post.\n","date":"20 March 2023","externalUrl":null,"permalink":"/trying-cooklang-with-golang-to-document-recipes/","section":"Posts","summary":"For many people, cooking is not just a means of sustenance but a beloved hobby and a way to express creativity in the kitchen. However, one of the biggest challenges for home cooks is keeping track of their recipes and possibly the list of interesting recipes from other people. In my opinion, it’s general a good idea to have a copy of such information on hand (since websites/videos hosting such recipes can eventually disappear). However, recording such information in plain text might be a tad “boring” - it’s also harder to kind of parse as well as process further. In this blog post, we will explore using cooklang as a possible tool to “standardize” such information.\n","title":"Trying cooklang with Golang to document recipes","type":"posts"},{"content":" Introduction # I used to work with Google Analytics to obtain site analytics for websites and android application. Technically, the current blog is monitored using Google Analytics. Monitoring of website data is generally useful as it provides information to the authors of the website/website owners on what particular content that website visitors find the most useful. With such information, it makes easier for the owner to try to add new content that attempts to provide such relevant content to visitors which would hopefully spur a virtuous cycle of gaining more audience for the website.\nOne of the irriting bits when working with Google Analytics is that in general, you wouldn\u0026rsquo;t have easy access to the raw data that is being collected from the website. For most users of Google Analytics, they might not need it too much; however, it may be pretty important for bigger and more sophisticated users of the tool. They may want to augment the raw data with even more custom data so that their analysis of such website visit data might be more useful but raw data access is quite hard to achieve. In some cases, one can access raw data but it requires paying a pretty expensive business plan (maybe it may not be relevant now but this was true in the past - they is a premium plan which is based on amount of data that is being collected by the Google Analytics tool)\nOne of the random things I did wonder about was the possibility of circumventing the need to pay for paying an expensive plan just to obtain data that you otherwise are supposed to freely access. But before we get to that stage, we would first need to understand slightly on how one even collects data via the Google Analytics tool in the first place.\nIn order to collect website visitation data from a website using Google Analytics, you would first need to create some sort of \u0026ldquo;analytics account\u0026rdquo; that would be used to identify on what \u0026ldquo;business\u0026rdquo; we\u0026rsquo;re trying to monitor. Once the \u0026ldquo;account\u0026rdquo; is created, we can enter it and then retrieve information such as Javascript snippet which would need to be embedded into our website in order to start collect information. The javascript snippet would retrieve actual javascript functions over the internet from Google Analytics servers that would then run http GET/POST requests to the Google Analytics servers which would then collect and collate such information on the servers.\nBy default, the Google Analytics javascript that is to be added to the website would usually point to Google Analytics servers but it would be nice if we can simply \u0026ldquo;hijack\u0026rdquo; the functionality and instead, point it to our own custom endpoint - which would automatically mean that we are collecting raw data. This would mean that we have to handle the hard work of sorting and storing all that data (if there is a ten million data points coming in each month, how should handle and store such data? And how should it be stored such that it would be easy to query in the future etc)\nInterestingly enough, there is a way to set a custom endpoint for Google Analytics Javascript snippet. The details of how this is done is available in the following blog post: https://www.simoahava.com/gtm-tips/send-google-analytics-requests-custom-endpoint/. We won\u0026rsquo;t go through the methodology of how Google Analytics work etc but we\u0026rsquo;re just demonstrating of how we can configure a Google Analytics Javascript to sent such analytics http requests to a custom endpoint on a Golang service.\nConfiguring it # The first part is first define our html templates that would represent our \u0026ldquo;website\u0026rdquo;. These are simple html pages. We would also define our analytics javascript snippet as a template that would injected into other templates (so that we don\u0026rsquo;t have to copy it everywhere).\nOur JS Snippet - the snippet is obtained from the Google Analytics \u0026ldquo;analytics account\u0026rdquo; that we would need to manually create. Do note the slight difference here where we added additional configuration in the last gtag function call. The transport_url would be the parameter for where we would be sending the Google Analytics http requests to. The forceSSL parameter would be whether to have the snippet force to \u0026ldquo;promote\u0026rdquo; all http requests to \u0026ldquo;https\u0026rdquo; requests. Https requests is definitely a good default but for testing purposes, it would always be nice to avoid this - since its a pain to setup.\nThis is saved as \u0026ldquo;header.tmpl\u0026rdquo; file\n{{define \u0026#34;analytics2\u0026#34;}} \u0026lt;!-- Google tag (gtag.js) --\u0026gt; \u0026lt;script async src=\u0026#34;https://www.googletagmanager.com/gtag/js?id=G-XXXXXXXX\u0026#34;\u0026gt;\u0026lt;/script\u0026gt; \u0026lt;script\u0026gt; window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag(\u0026#39;js\u0026#39;, new Date()); gtag(\u0026#39;config\u0026#39;, \u0026#39;G-XXXXXXXX\u0026#39;, { transport_url: \u0026#39;http://localhost:8080/analytics\u0026#39;, forceSSL: \u0026#34;false\u0026#34;, }); \u0026lt;/script\u0026gt; {{end}} Our main \u0026ldquo;index.tmpl\u0026rdquo; file. It would injected our analytics snippet in.\n{{define \u0026#34;index\u0026#34;}} \u0026lt;html\u0026gt; \u0026lt;head\u0026gt; {{template \u0026#34;analytics2\u0026#34;}} \u0026lt;/head\u0026gt; \u0026lt;body\u0026gt;This is index page\u0026lt;/body\u0026gt; \u0026lt;/html\u0026gt; {{end}} Our main golang file would be this. Don\u0026rsquo;t forgot to set up Golang modules for the Golang project to prevent further problems further down the line\npackage main import ( \u0026#34;log\u0026#34; \u0026#34;net/http\u0026#34; \u0026#34;text/template\u0026#34; ) type basicWebsite struct{} func (b *basicWebsite) ServeHTTP(w http.ResponseWriter, r *http.Request) { files := []string{ \u0026#34;./templates/header.tmpl\u0026#34;, \u0026#34;./templates/index.tmpl\u0026#34;, } ts, err := template.ParseFiles(files...) if err != nil { log.Print(err.Error()) http.Error(w, \u0026#34;Internal Server Error\u0026#34;, 500) return } err = ts.ExecuteTemplate(w, \u0026#34;index\u0026#34;, nil) if err != nil { log.Print(err.Error()) http.Error(w, \u0026#34;Internal Server Error\u0026#34;, 500) } } type GoogleAnalyticsParameters struct { // General ProtocolVersion string `json:\u0026#34;protocol_version\u0026#34;` TrackingID string `json:\u0026#34;tracking_id\u0026#34;` // User ClientID string `json:\u0026#34;client_id\u0026#34;` // Content Information DocumentLocationURL string `json:\u0026#34;document_location_url\u0026#34;` // System Info ScreenResolution string `json:\u0026#34;screen_resolution\u0026#34;` ViewportSize string `json:\u0026#34;viewport_size\u0026#34;` UserLanguage string `json:\u0026#34;user_language\u0026#34;` UserAgentArchitecture string `json:\u0026#34;user_agent_architecture\u0026#34;` UserAgentFullVersionList string `json:\u0026#34;user_agent_full_version_list\u0026#34;` UserAgentMobile bool `json:\u0026#34;user_agent_mobile\u0026#34;` UserAgentModel string `json:\u0026#34;user_agent_model\u0026#34;` UserAgentPlatform string `json:\u0026#34;user_agent_platform\u0026#34;` UserAgentPlatformVersion string `json:\u0026#34;user_agent_platform_version\u0026#34;` // Hit HitType string `json:\u0026#34;hit_type\u0026#34;` NonInteractionHit bool `json:\u0026#34;non_interaction_hit\u0026#34;` } type analytics struct{} // Reference: // https://www.thyngster.com/ga4-measurement-protocol-cheatsheet/ func (a *analytics) ServeHTTP(w http.ResponseWriter, r *http.Request) { log.Println(\u0026#34;start processing analytics request\u0026#34;) defer log.Println(\u0026#34;end processing analytics request\u0026#34;) ga_params := GoogleAnalyticsParameters{} // General ga_params.ProtocolVersion = r.URL.Query().Get(\u0026#34;v\u0026#34;) ga_params.TrackingID = r.URL.Query().Get(\u0026#34;tid\u0026#34;) // User ga_params.ClientID = r.URL.Query().Get(\u0026#34;cid\u0026#34;) // Content Information ga_params.DocumentLocationURL = r.URL.Query().Get(\u0026#34;dl\u0026#34;) // System Info ga_params.ScreenResolution = r.URL.Query().Get(\u0026#34;sr\u0026#34;) ga_params.ViewportSize = r.URL.Query().Get(\u0026#34;vp\u0026#34;) ga_params.UserLanguage = r.URL.Query().Get(\u0026#34;ul\u0026#34;) ga_params.UserAgentArchitecture = r.URL.Query().Get(\u0026#34;uaa\u0026#34;) ga_params.UserAgentFullVersionList = r.URL.Query().Get(\u0026#34;uafvl\u0026#34;) if r.URL.Query().Get(\u0026#34;uamb\u0026#34;) == \u0026#34;1\u0026#34; { ga_params.UserAgentMobile = true } ga_params.UserAgentModel = r.URL.Query().Get(\u0026#34;uam\u0026#34;) ga_params.UserAgentPlatform = r.URL.Query().Get(\u0026#34;uap\u0026#34;) ga_params.UserAgentPlatformVersion = r.URL.Query().Get(\u0026#34;uapv\u0026#34;) // Hit ga_params.HitType = r.URL.Query().Get(\u0026#34;t\u0026#34;) if r.URL.Query().Get(\u0026#34;ni\u0026#34;) == \u0026#34;1\u0026#34; { ga_params.NonInteractionHit = true } log.Printf(\u0026#34;%+v\\n\u0026#34;, ga_params) } func main() { http.Handle(\u0026#34;/index\u0026#34;, \u0026amp;basicWebsite{}) http.Handle(\u0026#34;/analytics/collect\u0026#34;, \u0026amp;analytics{}) http.Handle(\u0026#34;/analytics/g/collect\u0026#34;, \u0026amp;analytics{}) log.Fatal(http.ListenAndServe(\u0026#34;:8080\u0026#34;, nil)) } Our website has 2 main endpoints. The /index endpoint would be our main entry point for website. That would load up index.tmpl templates and showcase the javacript calls. The analytics http requests would be sent to /analytics/g/collect. The analytics requests url would usually be GET http requests with plenty of query parameters - which why we see a large function for attempting to parse the query parameters and getting the appropiate data from the URL. Even so, this doesn\u0026rsquo;t cover all possible query parameters; there are plnety of them that wasn\u0026rsquo;t even covered here - might be covered in a future blog post of where we can use this custom mechanism to capture analytics from random events such as clicking of a button.\nReference # For reference of how the server would look like, we can refer to the following github link (to this specific folder - the code for the folder may move in the future, just explore around the repo to find the most relevant codebase related to this)\nhttps://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/analytics\n","date":"28 February 2023","externalUrl":null,"permalink":"/custom-endpoint-for-google-analytics-data-with-golang/","section":"Posts","summary":"Introduction # I used to work with Google Analytics to obtain site analytics for websites and android application. Technically, the current blog is monitored using Google Analytics. Monitoring of website data is generally useful as it provides information to the authors of the website/website owners on what particular content that website visitors find the most useful. With such information, it makes easier for the owner to try to add new content that attempts to provide such relevant content to visitors which would hopefully spur a virtuous cycle of gaining more audience for the website.\n","title":"Custom Endpoint for Google Analytics data with Golang","type":"posts"},{"content":"There are some cases where we would need to host an application on our workstation but need it to be exposed publicly so that people would be able to access the application over the internet. There could be a variety of reasons for this to happen; e.g. data locality (too much data to transfer to the cloud - it might cost too much to store it in public cloud), application sensitivity (there are certain aspects that might make it bad to have it only run from public cloud - there is a need for applicaiton to be available on local network if there is no internet available), or maybe application can only be run on certain types of environment (e.g. mac). Most cloud vendors usually only provide windows and linux - mac environments are a bit on the rare side.\nWe would have to do a bunch of things to make this work - we\u0026rsquo;ll cover the steps in this blog post.\nGetting a virtual machine from a public cloud # One of the steps involved would be getting a virtual machine from a public cloud. We can probably do it manually by going through a usual UI and simply request for a virtual machine for our use.\nWe would want to make sure that we would be able ssh into the machine from our local workstation to said virtual machine. We can test it via the following command:\nssh -i ~/.ssh/virtual_machine_user virtual_machine_user@34.100.100.100 In the above example, we have a virtual machine that is available on IP address 34.100.100.100. Let\u0026rsquo;s say we created a new ssh key just for this scenario, we can put said private ssh key into the usual ~/.ssh folder and reference the file while using the ssh command. Natually, you would want to make sure that the public version of the new ssh key is replicated into our virtual machine.\nSetting up application and the tunnel # On our local workstation, we would probably just setup the application accordingly. Important portion is to ensure that application is accesible from local workstation itself.\nTo setup ssh tunnel, we would run the following. The important parts would be -R flag - it would specify which port that we\u0026rsquo;re trying to bind to on our remote machine to our application which would be hosted the local workstation on port 8080.\nThe -N flag is used to indicate to the ssh command that there will be no remote command to be run. Usually, the shell or bash commands would invoked if no command is provided to the ssh command but we don\u0026rsquo;t even want that to run - all we want is just a plain ssh tunnel that would ship data from our virtual machine to our application on the local workstation.\nssh -i ~/.ssh/virtual_machine_user -R 8080:127.0.0.1:8080 -N virtual_machine_user@34.100.100.100 Setting up nginx on virtual machine # We would be exposing the application via a virtual machine on a public cloud. This blog post won\u0026rsquo;t be covering on steps of how to create a virtual machine on a public cloud. However, we would be going some of the steps to setup nginx on the machine. To install nginx, we can run the following commands.\nsudo apt update sudo apt install -y nginx With this, a nginx is available to use on the virtual machine. At this point, we can do a quick test to make sure that nginx is accessible from the internet. If there are issues with accessing the nginx server, there is probably firewall rules that need to be configured accordingly.\nThe next step would be configure the nginx so that when a user access the nginx server, it would immediately be forwarded to the local workstation\u0026rsquo;s application. We can do this by altering the nginx configuration like so:\nEdit the /etc/nginx/sites-available/default\nlocation /hehe { rewrite ^/hehe/?(.*)$ /$1 break; proxy_pass http://127.0.0.1:8080; } This would mean that if our user tries to access any /hehe path on our server, we would be automatically redirected accordingly.\nNaturally, changes to the nginx configuration file would only be taken into account by reloading it\nsudo nginx -t sudo nginx -s reload Conclusion # The above is a basic setup of a ssh tunnel to access an application on our local workstation. However, there are definitely things that we would need to take note while doing this setup. It is pretty hard to recommend this approach unless you have a particularly good reason to not your application to the cloud. It almost feels like as though the setup is only for \u0026ldquo;testing\u0026rdquo; applications.\nDoing this as follows would mean we are probably going into a situation where we would be deploying the application in a non-scalable way. It doesn\u0026rsquo;t make sense for the application to create a bunch of ssh tunnels to the same virtual machine to have the application exposed publicly. For the services being setup in the following way, we should expect very little incoming traffic.\nAnother point of concern is that it is hard to guarantee the available of the application. There are many possible points of failure - application can fail, the ssh tunnel itself could hang/terminate prematurely.\n","date":"31 January 2023","externalUrl":null,"permalink":"/creating-a-ssh-tunnel-to-expose-a-web-application-from-a-workstation/","section":"Posts","summary":"There are some cases where we would need to host an application on our workstation but need it to be exposed publicly so that people would be able to access the application over the internet. There could be a variety of reasons for this to happen; e.g. data locality (too much data to transfer to the cloud - it might cost too much to store it in public cloud), application sensitivity (there are certain aspects that might make it bad to have it only run from public cloud - there is a need for applicaiton to be available on local network if there is no internet available), or maybe application can only be run on certain types of environment (e.g. mac). Most cloud vendors usually only provide windows and linux - mac environments are a bit on the rare side.\n","title":"Creating a SSH Tunnel to expose a web application from a workstation","type":"posts"},{"content":"Over the recent weekends, I\u0026rsquo;ve decided to take a gander and try another \u0026ldquo;serverless\u0026rdquo; tool called Google Cloud Workflows. The tool\u0026rsquo;s appeal is to be able coordinate a bunch of services in order to achieve a particular goal. The coordination effort (or workflow) can easily get pretty complex -\u0026gt; one way would be to script but if we want to have the capability to have the button to run the entire workflow from start to end with logging in place as well as capability to run the workflow based on particular triggers.\nLet\u0026rsquo;s have an example workflow that we intend to develop as follows:\nThe workflow would involve the following:\nRun an analysis on a csv file that is stored in Google Cloud Storage Generate a chart image out of the analysis Generate a pdf report that is to be sent to our \u0026ldquo;clients\u0026rdquo; Send email at the end Each of these steps can be automated accordingly. We will go into each of the steps one at a time.\nSetting up a fake Email Server # Notice that at the end, we would be sending an email to a \u0026ldquo;user\u0026rdquo;. For testing purposes, it wouldn\u0026rsquo;t make sense to get a email service to send email. In order to for us to be able to send fake emails for testing purposes, we can utilize a tool called MailHog. If we\u0026rsquo;re happy with the workflow, we can add code to the \u0026ldquo;send email\u0026rdquo; service to be able to utilize an actual email sending service such as SendGrid.\nTo run the MailHog, we can simply create a small virtual machine (we\u0026rsquo;re only deploying a MailHog that would store the mails in memory - so we would need the instance to be permanent).\nTo install MailHog on the server, we would need to run the following lines (or you can refer instructions on the MailHog\u0026rsquo;s github Readme):\nsudo apt-get -y install golang-go go get github.com/mailhog/MailHog We would need MailHog to run continuously - if the mail server goes down, we would need to be restarted. One way to do this would be to have it be managed by Systemd. The mail server binary would need to be copied to the appropiate location so that it can be managed by systemd. We would need to run the following commands:\nsudo useradd mailhog cp ~/go/bin/MailHog /usr/local/bin/mailhog/MailHog Naturally, we would need to ensure that the service has passwords. For testing purposes, we can add the following in the file: /usr/local/bin/mailhog/auth. The username and password is both \u0026ldquo;test\u0026rdquo;. We would need to have our \u0026ldquo;send email\u0026rdquo; service to send said auth when accessing our MailHog mail server.\ntest:$2a$04$V9Wl7HyqjdXS3FBbc0juGePhjf1GKkblJSqSt3HNC5fA7HzXA/8ua We would need then add the following file: /etc/systemd/system/mailhog.service\n[Unit] Description=Mailhog Requires=network-online.target After=network-online.target [Service] User=mailhog Group=mailhog Restart=on-failure ExecStart=/usr/local/bin/mailhog/MailHog -auth-file /usr/local/bin/mailhog/auth KillSignal=SIGTERM [Install] WantedBy=multi-user.target To get our service started, the following commands would need to be run:\nsudo systemctl daemon-reload sudo systemctl enable mailhog sudo systemctl start mailhog After this, if all goes well, we would get the following status:\n$ sudo systemctl status mailhog ● mailhog.service - Mailhog Loaded: loaded (/etc/systemd/system/mailhog.service; enabled; vendor preset: enabled) Active: active (running) since Sun 2022-10-09 21:21:07 UTC; 23h ago Main PID: 10835 (MailHog) Tasks: 6 (limit: 2355) Memory: 19.6M CPU: 1.940s CGroup: /system.slice/mailhog.service └─10835 /usr/local/bin/mailhog/MailHog -auth-file /usr/local/bin/mailhog/auth Oct 10 20:35:07 mailhog MailHog[10835]: [APIv1] KEEPALIVE /api/v1/events Oct 10 20:36:07 mailhog MailHog[10835]: [APIv1] KEEPALIVE /api/v1/events With that, one piece of our workflow is now up and running. We would now need to look at deploying the bunch of services that would handle the rest of our workflow.\nIn order to stick to the theme of ensuring that our workflow is as \u0026ldquo;serverless\u0026rdquo; as possible, we would run most of workloads within Cloud Run services.\nRun analysis # For our sample workflow, we can create a simple Golang services (ideally, Python would generally be a better choice here since it has better library support when it comes to analysis work). Since this is just a sample workflow just for trying out the Cloud Workflow purpose, the algorithms being coded in here is pretty simple and it is assumed that the data being provided for analysis is small enough such that it wouldn\u0026rsquo;t take too much time for the algorithm to run across the dataset.\nThis service would involves the following steps:\nRetrieve data (formatted as csv) from Google Cloud Storage Run some quick validation logic to make sure that the csv file is ok to be worked with Run algorithm that would summarize the dataset that is being provided Return output to the caller (which in our case, would be the Cloud Workflow tool) The example code is available in the following folder in the github repo. It also includes the Dockerfile that would be used to build out the container that would hold the binary that would run the analysis work.\nReference: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/runAnalysis\nMake charts # This would be the next of the workflow, which is to make the charts out of the summary provided from the analysis of the previous step. Charts are generally pretty complex to handle - most charts are actually targetted for the frontend, there isn\u0026rsquo;t much proper image libraries that would provide the functionality to create chart images pretty easily. Hence, the easiest approach is to just use one of those frontend chart libraries (which looks quite decent as well) and then just do a screenshot of the generated chart.\nThis service would involve the following:\nReceive the input via a POST request of the chart properties Render the chart via one of the routes in the application Use chromium browser to screenshot the rendered chart in a image file Save the image file into GCS Output out of the service the name of image file in GCS Reference: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/makeCharts\nCreate Report # Naturally, part of business processes would always involve sending analysis in reports. This is a service that would accept the analysis from previous processes and embed it in all a PDF document.\nThis service would involve the following:\nReceive input of how to produce the report via a POST request Pull the markdown file that is used for templating the report from GCS Pull chart images for the report from GCS Generate the PDF that would be \u0026ldquo;sent\u0026rdquo; to the \u0026ldquo;client\u0026rdquo; Save the PDF report into GCS Output the name of the PDF report (for future step reference) Reference: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/createReport\nSend Email # This would be the last step of the entire journey; someone always need to be at the end of the entire analysis lifecycle. Analysis are usually consumed by requiring users to access some form of dashboard or by have said analysis sent to them straight into their email inbox\nThis service would involve the following:\nReceive input of what report is to be sent Service would pull report that is to be sent from GCS Service would send the report via SMTP to email server (which in this is MailHog) Reference: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/sendEmail\nActually setting up the workflow # The last step would be to coordinate all of the above services together. Technically, we can kind of write up some scripts which we would need to manually go in and run but with scripts, they come with their own host of problems:\nIf script to coordinate the services is only on a \u0026ldquo;someone\u0026rsquo;s\u0026rdquo; work computer; if any issue comes about, we won\u0026rsquo;t be able to resolve it easily If it\u0026rsquo;s on that one person\u0026rsquo;s work computer, then its somewhat troublesome to ensure that the person\u0026rsquo;s work computer is up and running and would run the script with no issue when it is required for us to run it. We would need ways to properly monitor the running of this \u0026ldquo;script\u0026rdquo; to ensure that it\u0026rsquo;s completed successfully and in time to meet business requirements Not sure if script involves managing any state changes. In the case of the current example, we don\u0026rsquo;t need to store state but imagine if there is a need for it; we\u0026rsquo;ll need to ensure that it should be backed-up (just in case) For the above workflow, we can program it like so - a somewhat linear workflow:\nmain: params: [args] steps: - initializeWorkflow: assign: - sourceData: ${args.sourceData} - sendEmail: ${args.sendEmail} - reportTitle: ${args.reportTitle} - reportDescription: ${args.reportDescription} - runAnalysis: call: http.post args: url: https://run-analysis-xxxxx.a.run.app/run-analysis body: source_data: ${sourceData} result: runAnalysisResults - viewRunAnalysisBody: call: sys.log args: text: ${runAnalysisResults.body} - decodeRunAnalysisResults: call: json.decode args: data: ${runAnalysisResults.body} result: runAnalysisResultsBody - createChartImage: call: http.post args: url: https://make-charts-xxxxx.a.run.app/screenshot body: title: Sales Report x_axis_title: Product Names labels: ${runAnalysisResultsBody.products} data: ${runAnalysisResultsBody.revenue} result: createChartImageResults - decodeChartImageResults: call: json.decode args: data: ${createChartImageResults.body} result: createChartImageResultsBody - zzz: call: sys.log args: text: ${createChartImageResults.body} - createReport: call: http.post args: url: https://create-report-xxxxx.a.run.app/create-report body: title: ${reportTitle} description: ${reportDescription} template_file_name: haha.md image: ${createChartImageResultsBody.filename} result: createReportResults - decodeCreateReport: call: json.decode args: data: ${createReportResults.body} result: createReportResultsBody - sendEmailDecider: switch: - condition: ${sendEmail == false} steps: - earlyTerminatedStep: return: ${\u0026#34;Email is not sent. Please check \u0026#34; + createReportResultsBody.generated_report_name + \u0026#34; in GCS\u0026#34;} - sendEmail: call: http.post args: url: https://send-email-xxxxx.a.run.app/send-email body: to: test@test.com subject: This is another test body: Report Generated report_filename: ${createReportResultsBody.generated_report_name} - finalStep: return: \u0026#34;Report Generated. Please request receiver to check his email\u0026#34; To deploy it, we can run the following command:\ngcloud workflows deploy myFirstWorkflow --source=zzz.yaml And that would have the workflow pop into existance in the Google Cloud Project of our choice.\nConclusion # The cloud workflow tool is definitely interesting tool to try out but throughout the entire experience of \u0026ldquo;attempting\u0026rdquo; to use it, it does seem like a lot more time was spent in order to build out the services that would be consumed by the cloud workflows tool. More complex workflow tools would require more intricate services to be developed and hence, more effort is needed before we get to try more complicated features in Cloud Workflows products.\nThere are some interesting things that might be worth thinking/trying out in the future if I manage to thing of the appropiate use-cases:\nParallel steps in workflow Workflows calling other sub-workflows Retry of particular steps in workflows Having workflows await for human responses ","date":"10 October 2022","externalUrl":null,"permalink":"/trying-out-google-cloud-workflows/","section":"Posts","summary":"Over the recent weekends, I’ve decided to take a gander and try another “serverless” tool called Google Cloud Workflows. The tool’s appeal is to be able coordinate a bunch of services in order to achieve a particular goal. The coordination effort (or workflow) can easily get pretty complex -\u003e one way would be to script but if we want to have the capability to have the button to run the entire workflow from start to end with logging in place as well as capability to run the workflow based on particular triggers.\n","title":"Trying out Google Cloud Workflows","type":"posts"},{"content":"The leader election mechanism is a somewhat complex thing to kind of code up for an application. There are various Golang libraries that assist with this but it would be nicer if there were mechanisms within the environment that the application operate in which can help with this. In the case for the Kubernetes ecosystem - we can actual rely on the fact of how Kubernetes would usually etcd that does this leader election dance on our behalf. If we can tap on this mechanism, we can avoid introducing this mess of a complexity within our application.\nThis mechanism is made possible by having the application interact and attempt to create/update a configmap or endpoint resource. There is a \u0026ldquo;resource version\u0026rdquo; that can be passed within such create/update requests and if there were 2-3 applications concurrently doing this operation, only 1 of it would be processed - the rest would fail. With this, we can use the one which successfully processed its operation and have that become the leader - the rest becomes the followers.\nLeader election in app via Kubernetes mechanics # Refer to the following codebase: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/leaderElectionWithK8s\nIt is somewhat fortunate that leader election is a common enough use case that it\u0026rsquo;s availability as a utility within the client-go Kubernetes client library. Refer to the specific part of the client-go library here: https://pkg.go.dev/k8s.io/client-go/tools/leaderelection\nThe important bits for the portion for doing leader election is to first create the resource lock that would be used to store the details of how the leader election details would be held. I\u0026rsquo;ve generally dealt with leader election in Kubernetes Operators via Configmaps and that\u0026rsquo;s the one I\u0026rsquo;m somewhat more familiar with so hence, this is why the Configmap resource is chosen here but I doubt either has advantages/disadvantages. They are the more minimalistic APIs that can be utilized for leader election.\nrl, err := resourcelock.NewFromKubeconfig(resourcelock.ConfigMapsLeasesResourceLock, \u0026#34;default\u0026#34;, \u0026#34;app-lock\u0026#34;, resourcelock.ResourceLockConfig{ Identity: POD_NAME, }, config, 10*time.Second) The more important bit is to define the information for leader election struct and how to define the information that is to be used for leader election.\nctx := context.Background() LESettings := leaderelection.LeaderElectionConfig{ Lock: rl, LeaseDuration: 10 * time.Second, RenewDeadline: 5 * time.Second, RetryPeriod: 2 * time.Second, Callbacks: leaderelection.LeaderCallbacks{ OnStartedLeading: zzz, OnStoppedLeading: func() { fmt.Println(\u0026#34;Stopped\u0026#34;) panic(\u0026#34;stopped leading\u0026#34;) }, OnNewLeader: func(id string) { if id != POD_NAME { fmt.Println(\u0026#34;is not the leader\u0026#34;) leaderState = false } else { fmt.Println(\u0026#34;is the leader\u0026#34;) leaderState = true } }, }, Name: \u0026#34;debugging\u0026#34;, } // ... Some other code can go here leaderelection.RunOrDie(ctx, LESettings) The resource lock that we created earlier is passed here into the LeaderElection configuration struct. Other things that we need to configure would be:\nLease duration (How long to hold the \u0026ldquo;leader\u0026rdquo; state for a pod safely) Renew deadline (How long for leaders to retry getting \u0026ldquo;leader\u0026rdquo; role) Retry period (How long before other clients try to get leader role) Renew deadline parameter has to be less than lease duration - if you attempt to configure it as less than lease duration - you will see a runtime error and the application will panic and crash (Due to the RunOrDie function)\nThe other parameter that needs to be filled up here would be the callbacks - we need to define actions on what to do when leader election is successful - and what to do when the other pod is the leader instead of current pod. The above example is pretty much too simple of an example - additional error checks would probably need to be done in order to make sure it works with less errors/confusion.\nWith all of that, our leader election struct will be fully configured and we can pass it to our RunOrDie function which would do a leader election as the application runs. We can decide on what needs to be done if the pod becomes a leader.\nDeploying app with leader election # In general, I\u0026rsquo;d imagine that applications that require leader election would also need stable network identities as well. This is partially why, within that code base - the application is deployed via StatefulSets. This would allow us to potentially send data to specific pod endpoints that only leaders can handle (but this would covered in a future post). Right now, the focus for this codebase is to test that leader election works as expected and applications can become leader if required.\nOne of the more important things to handle is the RBAC permissions needed to get this whole application to run. Firstly, we would need to Cluster permissions to read Pod information (part of the zzz function - a tad unnecessary but this was a past functionality - leader election was added after that function was built). The important one for the leader election capability is the following RBAC specs.\napiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: leader-election-configmaps rules: - apiGroups: [\u0026#34;\u0026#34;] # \u0026#34;\u0026#34; indicates the core API group resources: [\u0026#34;configmaps\u0026#34;] verbs: [\u0026#34;get\u0026#34;, \u0026#34;watch\u0026#34;, \u0026#34;list\u0026#34;, \u0026#34;create\u0026#34;, \u0026#34;update\u0026#34;] - apiGroups: [\u0026#34;coordination.k8s.io\u0026#34;] resources: [\u0026#34;leases\u0026#34;] verbs: [\u0026#34;get\u0026#34;, \u0026#34;list\u0026#34;, \u0026#34;watch\u0026#34;, \u0026#34;create\u0026#34;, \u0026#34;update\u0026#34;] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: leader-election-configmaps subjects: - kind: ServiceAccount name: leader-election-app namespace: default roleRef: kind: Role name: leader-election-configmaps apiGroup: rbac.authorization.k8s.io The application definitely needs capability to get, create and update configmaps and leases. (list and watch may not be too necessary but I haven\u0026rsquo;t tested removing them). This can be added to the leader-election-configmaps which is then added to the leader-election-app service account user. This user will be the one that would be mounted into the pod where the application would then have permissions to access the kubernetes API to retrieve the relevant information or to manipulate the k8s resources accordingly.\nLease API - just a lightweight way to do heartbeats? # We can kind of ignore the details behind the implementation of this function but if we try to dig around behind the needs of the rbac spec that needs to be specified to get this to work, we would kind of stumble and wonder - what is this Lease API and what\u0026rsquo;s its for?\nIt is somewhat easy to follow along the codebase in client-go library to figure out what the RunOrDie function do. However, the thing is - most of the codebase don\u0026rsquo;t explain why the Lease API even exist and how it helps with the whole leader election business.\nJust basing of the following github issues - my guess is that Lease API was created to create a lightweight API which can be used to do heartbeat checks (part of Leader Election mechanism) - essentially, if a \u0026ldquo;application/node\u0026rdquo; fails to renew and extend the deadline of it being a leader - it is assumed to have fail and leader election would need to take place in order to figure who to be the leader next.\nReferences:\nhttps://github.com/kubernetes/kubernetes/issues/14733\nhttps://github.com/kubernetes/kubernetes/issues/80289\nI guess to properly understand it, one would need to read the implementation of the Lease API or the KEP documentation that mentions it but I guess that should be covered in another blog post.\nFuture work # I will probably further build out this application further to attempt to replicate some sort of nosql database a little (but a very very bad version of it) - maybe it can store data in json format. Some of the things to look at and to build out:\nData that is to be written to file storage can be sent to any pod within the Statefulset but it\u0026rsquo;ll be redirected to master pod Data is replicated and mirrored and shared accordingly (consistent hashing mechanism) Any pod can be used for reading except for the leader. Leader would redirect the request followers since it should be busy with writing data into storage backend Eventually, I\u0026rsquo;d want to also explore implementing this leader election without this Kubernetes mechanism - probably use the various Golang raft libraries for building such functionalities.\n","date":"28 August 2022","externalUrl":null,"permalink":"/leader-election-in-kubernetes-via-kubernetes-configmaps-and-leases/","section":"Posts","summary":"The leader election mechanism is a somewhat complex thing to kind of code up for an application. There are various Golang libraries that assist with this but it would be nicer if there were mechanisms within the environment that the application operate in which can help with this. In the case for the Kubernetes ecosystem - we can actual rely on the fact of how Kubernetes would usually etcd that does this leader election dance on our behalf. If we can tap on this mechanism, we can avoid introducing this mess of a complexity within our application.\n","title":"Leader Election in Kubernetes via Kubernetes Configmaps and Leases","type":"posts"},{"content":"The whole process of profiling an application is an attempt to identify hotspots within the application which consumes more resources or takes too much time - knowing this would allow us to identify how to further improve the code within the applications that we build in order to build applications that consume less resources or would respond better to external inputs. Profiling of an application is just another aspect to improve observability of application\u0026rsquo;s performance on top of the common usual tooling such as distributed traces, metrics and logs. Tools such as distributed traces, metrics and logs only can capture part of the picture of how an application performs within an environment but is different for profiling. Profiling would point out what is happening \u0026ldquo;internally\u0026rdquo; within the application such as amount of memory being allocated for particular functions, how much CPU time is being taken for a particular function, thereby providing even more visiblity to how the application works.\nUnfortunately, I don\u0026rsquo;t work in this performance analysis space too much so I don\u0026rsquo;t fully understand the various tools that can help with this but I do know that within the Golang programming language, one can utilize a tool called PProf to do the \u0026ldquo;profiling\u0026rdquo; of the application as mentioned above. Generally, these profiles are generally collected as a one-off - usually being obtained when \u0026ldquo;stuff\u0026rdquo; happens in production; e.g. application crashes or application being unresponsive for certain endpoints etc. However, seeing that such information is collection as one-off pieces of data, we as engineers could easily miss the moment of the most \u0026ldquo;ideal\u0026rdquo; time to collect such information which would be able to help us debug the situation much more easily. Imagine if we could have a tool that could do it continuously on our behalf\u0026hellip;\nWithin the Google Cloud suite of items, there is a product called Cloud Profiler - which essentially attempts to capture the status of applications at different points of when an application is running. However, this tool mostly operates within the Google Cloud ecosystem - unfortunately, not every company/everybody operates within that space - so, it would be better if there was something in the open source community that would somewhat cover this need (I guess something similar to Prometheus + Grafana to Google Cloud\u0026rsquo;s monitoring/metrics vizualization system). If only there is a somewhat similar tool out there in the open source space.\nLuckily there is, and that tool\u0026rsquo;s name is Pyroscope. Refer to the following product page for details on it as well as its documentation\nRather than going through the benefits of the tool and its various use cases etc (which you can find on the product/documentation pages), this post would focus more on the experience of getting the thing to work on Kubernetes (in particular, GKE but I suppose it\u0026rsquo;ll probably work on other Kubernetes distributions as well)\nThere are a few things to do up before we can finally demonstrate the possibility of this working\nSetting the environment on Kubernetes cluster # This is considered one of the \u0026ldquo;complex\u0026rdquo; bit as it involves a lot of moving pieces. However, take into mind that the approach that is being used here is a bit on the \u0026ldquo;exploratory\u0026rdquo; side since we even attempt to deploy components that one would assume is available in public cloud (e.g. object storage etc)\nRefer to the following folder within that same repo: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Environment/kubernetes\nThe setup is still somewhat unstable (it still fails once in a while and there is a need to step in to manually correct the setup to get everything up and running). However, I do intend to eventually get to a state where this environment can be fully setup without any issues.\nThe first command to run from this folder is the make cluster command. This would use gcloud command to create a GKE standard cluster on the account that is registered with the gcloud command. This post will not cover of how to set up Gcloud with the relevant credentials. Refer to the following page for how to do this: https://cloud.google.com/sdk/docs/initializing\nNext step would be to ensure that all helm repos are setup (and we\u0026rsquo;re gonna be installing a lot of helm charts across many helm repos). Run the make environment to do so. Ensure that helm 3 is already used on where this command is running from.\nI suppose the last step to set all \u0026ldquo;observability\u0026rdquo; tools including pyroscope would be the make observability command which would install the following:\nPrometheus (For scrapping metric information from apps) Minio (Open source object storage - still experimenting to operate it) Loki (Log collection tool) Promtail (Tool to retrieve logs from kubernetes workloads and push it to Loki (or equivalent)) Pyroscope (Continuous Profiling tool) Tempo (Distributed Tracing tool) Grafana (Vizualization tool) If all goes well, the command would run without issue and we can have our entire observability stack running with Pyroscope included in it as well.\nIn order to make it easier to add new applications to have their pprof profiles to be retrieved as well, it would be best to do it without static configurations. We can setup scrape configurations to have Pyroscope to check against Kubernetes API to see pods with particular annotations and then have Pyroscope scrape pprof profiles of the pods.\nFor the scrape configurations:\n(Do not rely too much on this code snippet - when the Pyroscope tool change, the configuration might not work anywhere)\nrbac: create: true pyroscopeConfigs: log-level: debug scrape-configs: # Example scrape config for pods # # The relabeling allows the actual pod scrape endpoint to be configured via the # following annotations: # # * `pyroscope.io/scrape`: Only scrape pods that have a value of `true`. # * `pyroscope.io/application-name`: Name of the application being profiled. # * `pyroscope.io/scheme`: If the metrics endpoint is secured then you will need # to set this to `https` \u0026amp; most likely set the `tls_config` of the scrape config. # * `pyroscope.io/port`: Scrape the pod on the indicated port. # * `pyroscope.io/profile-{profile_name}-path`: Specifies URL path exposing pprof profile. # * `pyroscope.io/profile-{profile_name}-param-{param_key}`: Overrides scrape URL parameters. # # Kubernetes labels will be added as Pyroscope labels on metrics via the # `labelmap` relabeling action. - job-name: \u0026#39;kubernetes-pods\u0026#39; enabled-profiles: [cpu, mem] kubernetes-sd-configs: - role: pod relabel-configs: - source-labels: [__meta_kubernetes_pod_annotation_pyroscope_io_scrape] action: keep regex: true - source-labels: [__meta_kubernetes_pod_annotation_pyroscope_io_application_name] action: replace target-label: __name__ - source-labels: [__meta_kubernetes_pod_annotation_pyroscope_io_spy_name] action: replace target-label: __spy_name__ - source-labels: [__meta_kubernetes_pod_annotation_pyroscope_io_scheme] action: replace regex: (https?) target-label: __scheme__ - source-labels: [__address__, __meta_kubernetes_pod_annotation_pyroscope_io_port] action: replace regex: ([^:]+)(?::\\d+)?;(\\d+) replacement: $1:$2 target-label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source-labels: [__meta_kubernetes_namespace] action: replace target-label: kubernetes_namespace - source-labels: [__meta_kubernetes_pod_name] action: replace target-label: kubernetes_pod_name - source-labels: [__meta_kubernetes_pod_phase] regex: Pending|Succeeded|Failed|Completed action: drop - action: labelmap regex: __meta_kubernetes_pod_annotation_pyroscope_io_profile_(.+) replacement: __profile_$1 This was reference based on what was mentioned on the Pyroscope docs website: https://github.com/pyroscope-io/pyroscope/tree/main/examples/golang-pull/kubernetes\nWithin the deploy.yaml file - you can probably notice this:\n...// Within the deployment spec template: metadata: labels: run: app3 annotations: pyroscope.io/scrape: \u0026#39;true\u0026#39; pyroscope.io/application-name: \u0026#39;app3\u0026#39; pyroscope.io/profile-cpu-enabled: \u0026#39;true\u0026#39; pyroscope.io/profile-mem-enabled: \u0026#39;true\u0026#39; pyroscope.io/port: \u0026#39;8080\u0026#39; spec: containers: - image: full-observability:v5 ... Application with Pprof endpoints available # First part is that we would need to have an application that is able to present pprof details externally. This can be somewhat easy to do with various examples available on the web - the pprof golang package provides handlers that can be used to provide all debug pprof endpoints that covers the entire list of profiles to debug - e.g. CPU, Heap, Goroutines etc.\nHowever, in the case where one already have an endpoint in use (could be because the application is a web server), we can just create just debug endpoints that would be additionally served on our web server api endpoint.\nr := mux.NewRouter() r.HandleFunc(\u0026#34;/\u0026#34;, handler) r.Handle(\u0026#34;/healthz\u0026#34;, StatusHandler{StatusType: \u0026#34;healthz\u0026#34;}) r.Handle(\u0026#34;/readyz\u0026#34;, StatusHandler{StatusType: \u0026#34;readyz\u0026#34;}) r.Handle(\u0026#34;/metrics\u0026#34;, promhttp.Handler()) // Profiling endpoints r.HandleFunc(\u0026#34;/debug/pprof/\u0026#34;, pprof.Index) r.Handle(\u0026#34;/debug/pprof/allocs\u0026#34;, pprof.Handler(\u0026#34;allocs\u0026#34;)) r.Handle(\u0026#34;/debug/pprof/goroutine\u0026#34;, pprof.Handler(\u0026#34;goroutine\u0026#34;)) r.Handle(\u0026#34;/debug/pprof/heap\u0026#34;, pprof.Handler(\u0026#34;heap\u0026#34;)) r.Handle(\u0026#34;/debug/pprof/mutex\u0026#34;, pprof.Handler(\u0026#34;mutex\u0026#34;)) r.HandleFunc(\u0026#34;/debug/pprof/profile\u0026#34;, pprof.Profile) All of the above pprof endpoints are also served via the same port - which in the case of this application\u0026rsquo;s example - it would be port 8080.\nWe can test that the profiles can be obtained by running ti locally and hitting the debug pprof endpoints and it should allow us to download the profiles. If we wish to vizualize the gathered profiles, we can attempt to do so via the following:\ngo tool pprof -http=localhost:8500 http://localhost:8080/debug/pprof/heap The application would be deployed to kubernetes cluster which is why you would see a deploy.yaml manifest with a kustomize.yaml file along side it to accomodate cases where we need to alter all the values of images in the deployment spec of the deploy.yaml file. However, to capture the profiles (as well as other debugging information), we would need to setup all the relevant tooling for it.\nIn order to get the the application above to deploy, we would need to first build the docker image for this. This can be done via docker build -t gcr.io/\u0026lt;project id\u0026gt;/full-observability:0.0.1.\nNext would be to push it to a container registry. In the case of this project, we would be pushing it to Google Container Registry and that can be done via docker push gcr.io/\u0026lt;project id\u0026gt;/full-observability:0.0.1. Don\u0026rsquo;t forget to set up the credentials for this by following this: https://cloud.google.com/container-registry/docs/advanced-authentication\nThe next step is to actually run the application - it will the same container with 3 different names (since it\u0026rsquo;s actually configured to demonstrate distributed tracing as well). Run the following command: kustomize build ./deployment | kubectl apply -f -\nRefer to the following codebase for full Golang codebase (just within the folder) - https://github.com/hairizuanbinnoorazman/Go_Programming/blob/master/Web/fullObservability/main.go\nA successful deployment? # When everything from above is successful installed on the cluster, we can run the make access-grafana from the following folder: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Environment/kubernetes. This would set up a forward proxy for an unexposed Kubernetes service which we can access from our local workstation via port 3000.\nWith that, we can attempt to make a simple pprof dashboard with the above setup and it\u0026rsquo;ll probably look like the following:\nFor the pyroscope server itself - the pprof profiles would look like this:\nFor the app itself - the pprof profiles would look like this:\nThe following was manually created (will look to add it as a dashboard in the future)\nSome observations # These are some personal takeaways as I attempted to try the following test to get Pyroscope working:\nNeed to fix versions of all helm charts to be installed. While attempting to run this, realized that all helm charts being installed for purpose of the observability environment is outdated and the newer helm charts are unable to work with past configurations Apparently, there seems to be no way to list applications available for viewing in the dashboard. The pictures shown above with configurable appname at the top is merely a hack - the value is obtained via query but by the user manually typing it in and then, the dashboard merely appending that variable to draw all of the said profiles. Hopefully this janked hack will be resolved in the future although this is still a workable solution now. The kube-prometheus-stack is not as flexible as I thought; certain features that I need such as the plugins for Pyroscope seems to be hard to injected in - and hence, I had to disable it in that chart and install a separate Grafana dashboard for it ","date":"5 August 2022","externalUrl":null,"permalink":"/continuous-profiling-of-applications-in-kubernetes-via-pyroscope/","section":"Posts","summary":"The whole process of profiling an application is an attempt to identify hotspots within the application which consumes more resources or takes too much time - knowing this would allow us to identify how to further improve the code within the applications that we build in order to build applications that consume less resources or would respond better to external inputs. Profiling of an application is just another aspect to improve observability of application’s performance on top of the common usual tooling such as distributed traces, metrics and logs. Tools such as distributed traces, metrics and logs only can capture part of the picture of how an application performs within an environment but is different for profiling. Profiling would point out what is happening “internally” within the application such as amount of memory being allocated for particular functions, how much CPU time is being taken for a particular function, thereby providing even more visiblity to how the application works.\n","title":"Continuous Profiling of Applications in Kubernetes via Pyroscope","type":"posts"},{"content":"A friend of mine once mentioned about one of the tasks that he had to go through during his programming days was to build out a server which would respond to the redis-cli tool and I started to think - \u0026ldquo;that\u0026rsquo;s something I\u0026rsquo;ve never done before\u0026hellip; I wonder how hard it is?\u0026rdquo; After a day of tinkering around - it\u0026rsquo;s definitely something that\u0026rsquo;s not \u0026ldquo;intuitive\u0026rdquo; to immediately get done; there are definitely some concepts that I\u0026rsquo;m not super clear about but it\u0026rsquo;s definitely something that can be slowly built out while learning various concepts.\nThere are definitely some good learnings that can be obtained while building out such a server.\nGenerally, common server examples assume that client and servers would be interacting with already established/built protocols such as JSON or GRPC protocols. There are already plenty of practical examples for handling incoming traffic but definitely very few examples of how one can built a server that would interact with the \u0026ldquo;redis protocol\u0026rdquo; coming from the redis-cli tool. This is definitely the main reference document when trying to build it: https://redis.io/docs/reference/protocol-spec/ Finally come across example \u0026ldquo;algorithm\u0026rdquo; questions from Computer science classes which throws the most awkward set of arrays and request for one to try to solve it. Some of the algorithm questions I\u0026rsquo;ve seen before would be: Given an array where the first item of the array lists the number of items that would be array as well as metadata and data being interweaved in the array - compute a response for it. (Just the vague recall of such computer science questions). In the case of redis, an example of this would be something like this: [\u0026quot;*1\u0026quot;, \u0026quot;$4\u0026quot;, \u0026quot;ping\u0026quot;] - where the first item indicates there would be \u0026ldquo;one\u0026rdquo; piece of data that would be vital to be processed; the second item in the array indicates the length of the data (i guess it\u0026rsquo;s more for optimization? To ensure that the right sized array is provisioned for the incoming data) while the third piece of item in this array is \u0026ldquo;ping\u0026rdquo;. The first piece of data generally tends to be the \u0026ldquo;command\u0026rdquo; that we would want the server handle - ping is an example, but we could have get, set, lpush etc With that out of the way, here is some sample code in Golang that codes out a REDIS server that responds to ping, set, get, lpush and lrange. (Responds but may give the wrong answer at times?)\npackage main import ( \u0026#34;bufio\u0026#34; \u0026#34;fmt\u0026#34; \u0026#34;net\u0026#34; \u0026#34;strconv\u0026#34; ) func main() { fmt.Println(\u0026#34;Start server\u0026#34;) defer fmt.Println(\u0026#34;Stop server\u0026#34;) ln, _ := net.Listen(\u0026#34;tcp\u0026#34;, \u0026#34;:6379\u0026#34;) sstore := map[string]string{} store := map[string][]string{} for { conn, _ := ln.Accept() scanner := bufio.NewScanner(conn) c := Command{sstore: sstore, store: store, conn: conn} inputEntry := 0 for { if ok := scanner.Scan(); !ok { break } rawInput := scanner.Text() fmt.Println(rawInput) if inputEntry == 2 { c.name = rawInput } if inputEntry == 4 { c.listName = rawInput } if inputEntry == 6 { c.itemVal = rawInput } if inputEntry == 8 { c.itemVal2 = rawInput } c.Run() inputEntry = inputEntry + 1 } } } type Command struct { sstore map[string]string store map[string][]string conn net.Conn name string listName string itemVal string itemVal2 string } func (c *Command) Run() { if c.name == \u0026#34;ping\u0026#34; { PrintPong(c.conn) } if c.name == \u0026#34;set\u0026#34; \u0026amp;\u0026amp; c.listName != \u0026#34;\u0026#34; \u0026amp;\u0026amp; c.itemVal != \u0026#34;\u0026#34; { c.sstore[c.listName] = c.itemVal c.conn.Write([]byte(\u0026#34;+OK\\r\\n\u0026#34;)) } if c.name == \u0026#34;get\u0026#34; \u0026amp;\u0026amp; c.listName != \u0026#34;\u0026#34; { val := c.sstore[c.listName] processed := fmt.Sprintf(\u0026#34;$%v\\r\\n%v\\r\\n\u0026#34;, len(val), val) c.conn.Write([]byte(processed)) } if c.name == \u0026#34;lpush\u0026#34; \u0026amp;\u0026amp; c.listName != \u0026#34;\u0026#34; \u0026amp;\u0026amp; c.itemVal != \u0026#34;\u0026#34; { c.store[c.listName] = append([]string{c.itemVal}, c.store[c.listName]...) c.conn.Write([]byte(fmt.Sprintf(\u0026#34;:%v\\r\\n\u0026#34;, len(c.store[c.listName])))) } if c.name == \u0026#34;lrange\u0026#34; \u0026amp;\u0026amp; c.listName != \u0026#34;\u0026#34; \u0026amp;\u0026amp; c.itemVal != \u0026#34;\u0026#34; \u0026amp;\u0026amp; c.itemVal2 != \u0026#34;\u0026#34; { // c.itemVal is \u0026#34;starting value\u0026#34; // c.itemVal2 is \u0026#34;ending value\u0026#34; - if more -\u0026gt; it would mean everything startIdx, _ := strconv.Atoi(c.itemVal) endIdx, _ := strconv.Atoi(c.itemVal2) items := c.store[c.listName] if startIdx \u0026gt;= len(items) { c.conn.Write([]byte(\u0026#34;*0\\r\\n\u0026#34;)) } else if endIdx \u0026lt; len(items) \u0026amp;\u0026amp; endIdx \u0026gt;= 0 { items = items[startIdx : endIdx+1] } else if endIdx \u0026gt;= len(items) || endIdx \u0026lt; 0 { items = items[startIdx:] } processed := fmt.Sprintf(\u0026#34;*%v\\r\\n\u0026#34;, len(items)) for _, j := range items { processed = processed + fmt.Sprintf(\u0026#34;$%v\\r\\n%v\\r\\n\u0026#34;, len(j), j) } fmt.Println(processed) c.conn.Write([]byte(processed)) } } func PrintPong(conn net.Conn) { conn.Write([]byte(\u0026#34;+PONG\\r\\n\u0026#34;)) } A few things to note regarding about the above piece of code:\nIt does not support interactive mode of redis-cli (just typing redis-cli in terminal) It supports very few commands (but some may not even give accurate responds) It does not validate inputs (e.g. for lrange command - we expect 3 inputs; listname, start index and end index to retrieve. Current code above only \u0026ldquo;hangs\u0026rdquo; if incomplete inputs provided) ","date":"20 July 2022","externalUrl":null,"permalink":"/fake-redis-server-built-with-golang/","section":"Posts","summary":"A friend of mine once mentioned about one of the tasks that he had to go through during his programming days was to build out a server which would respond to the redis-cli tool and I started to think - “that’s something I’ve never done before… I wonder how hard it is?” After a day of tinkering around - it’s definitely something that’s not “intuitive” to immediately get done; there are definitely some concepts that I’m not super clear about but it’s definitely something that can be slowly built out while learning various concepts.\n","title":"Fake Redis Server built with Golang","type":"posts"},{"content":"Over the past month, I decided to go down the rabbit hole of exploring an example of a self balancing tree data structure. I generally don\u0026rsquo;t need to handle data structures on a day to day basis - I mostly deal with integration of tools as well as deployment of tools into a Kubernetes cluster. However, even if I don\u0026rsquo;t deal with that side of things, I do find that some of the thought process behind the data structures and algorithms are pretty interesting. (I\u0026rsquo;m still kind of waiting for a moment where I can actually utilize it in my work for real in a way)\nA self balancing binary tree is kind of a extension of the usual binary tree. In a binary tree, in order to make it useful, we would use it to arrange incoming data - thereby, we can immediately print the values in a sorted manner. However, the binary tree as it is comes with its own set of weakness; which can be demonstrated in the following example.\nLet\u0026rsquo;s say we have a binary tree where data that is less that root is inserted to the left of the root and data that is more than the right is sorted to the right. If we are to insert data into tree in the following order: 20, 30, 10; we will get the following tree.\nIf we attempt to do inserts or searching whether an element exists in the tree, we would ideal hit a time complexity of O(log(n))\nHowever, if the order of the input changed to 10, 20 and then 30, the tree (following the above logic) would result in the following structure.\nIn this case, assuming a worst case scenario, potentially, our binary tree would almost become like a Singly Linked list if the item being fed to the tree is a ordered list. Potential time complexity in this scenario is O(n).\nHow can we improve this? We can do so by adding the capability for the tree to automatically self balanced itself the moment it detects that it is imbalanced in any way. One example of a self balancing binary tree is AVL tree - named after its inventors (Adelson-Velsky and Landis). The following video explains the concepts way better as compared to what that would be covered in this blog post. This blog post would focus more on an attempt of an implementation of the AVL tree structure.\nThe first part is to define the \u0026ldquo;nodes\u0026rdquo; that would comprise of the tree:\ntype Node struct { Value int Left *Node Right *Node } The next parts would definitely be create 2 types of printing functions. One is to test the tree\u0026rsquo;s capability to ensure that no matter, its \u0026ldquo;Inorder\u0026rdquo; printing of the tree would always be printing a sorted list. (Inorder printing prints from left most nodes first before printing the root nodes and finally, the right nodes)\nfunc InorderPrint(root *Node) { if root == nil { return } if root.Left != nil { InorderPrint(root.Left) } fmt.Println(root.Value) if root.Right != nil { InorderPrint(root.Right) } } The other printing function that we would need is more of a level based printing function that would serve more for debugging purposes (to see the number of levels in the tree as well as to see the data that is printed on per level basis)\nfunc PrintLevel(root *Node, currentLevel, level int) { if root == nil { return } if currentLevel == level { fmt.Println(root.Value) } PrintLevel(root.Left, currentLevel+1, level) PrintLevel(root.Right, currentLevel+1, level) } Naturally, we would also need to have a function to create a function that prints out the maximum depth of the tree (also for debugging as well as to help us iterate the PrintLevel function)\nfunc MaxDepth(root *Node) int { if root == nil { return 0 } numL := MaxDepth(root.Left) + 1 numR := MaxDepth(root.Right) + 1 if numL \u0026gt;= numR { return numL } return numR } Once, we have all the above functions, we can finally move to the most critical bit, which is the Insert function. There are a few things that we would need to handle (do make sure to watch the youtube video above since most of this algo is implemented here is based on that)\nChecking for imbalance of Left Hand side of tree and Right Hand side of tree LL rotation of tree nodes RR rotation of tree nodes LR rotation of tree nodes (complex scenario - imagine data nodes coming in is 30, 10 and lastly 20 - the tree has to be manipulated in a weird way to ensure balance) RL rotation of tree nodes (complex scenario - imagine data nodes coming in is 30, 10 and lastly 20 - the tree has to be manipulated in a weird way to ensure balance) The algo implemented attempts to cover all of the above (there could be bugs so make sure take a grain of salt while reading the codebase)\nfunc Insert(root *Node, newNode *Node) *Node { if root == nil { return newNode } if newNode.Value \u0026lt;= root.Value { root.Left = Insert(root.Left, newNode) } else { root.Right = Insert(root.Right, newNode) } LH := MaxDepth(root.Left) RH := MaxDepth(root.Right) LHBalance := 0 RHBalance := 0 if root.Left != nil { LHBalance = MaxDepth(root.Left.Left) - MaxDepth(root.Left.Right) } if root.Right != nil { RHBalance = MaxDepth(root.Right.Left) - MaxDepth(root.Right.Right) } // Left hand side too heavy if (LH-RH) \u0026gt;= 2 \u0026amp;\u0026amp; LHBalance \u0026gt;= 0 { newRoot := root.Left root.Left = newRoot.Right newRoot.Right = root return newRoot } // Right hand side too heavy if (LH-RH) \u0026lt;= -2 \u0026amp;\u0026amp; RHBalance \u0026lt;= 0 { newRoot := root.Right root.Right = newRoot.Left newRoot.Left = root return newRoot } // Double rotation cases if (LH-RH) \u0026gt;= 2 \u0026amp;\u0026amp; LHBalance \u0026lt; 0 { newRoot := root.Left.Right root.Left.Right = nil newRoot.Left = root.Left root.Left = newRoot.Right newRoot.Right = root return newRoot } // Double rotation cases if (LH-RH) \u0026lt;= -2 \u0026amp;\u0026amp; RHBalance \u0026gt; 0 { newRoot := root.Right.Left root.Right.Left = nil newRoot.Right = root.Right root.Right = newRoot.Left newRoot.Left = root return newRoot } return root } With that, now we have all the required basic functionality that we would need in order to test the automatic self balanced binary tree. We can do so in the following Golang codebase:\n// This package is meant for building a self balancing BST (AVL) package main import \u0026#34;fmt\u0026#34; type Node struct { Value int Left *Node Right *Node } func InorderPrint(root *Node) { if root == nil { return } if root.Left != nil { InorderPrint(root.Left) } fmt.Println(root.Value) if root.Right != nil { InorderPrint(root.Right) } } func MaxDepth(root *Node) int { if root == nil { return 0 } numL := MaxDepth(root.Left) + 1 numR := MaxDepth(root.Right) + 1 if numL \u0026gt;= numR { return numL } return numR } func PrintLevel(root *Node, currentLevel, level int) { if root == nil { return } if currentLevel == level { fmt.Println(root.Value) } PrintLevel(root.Left, currentLevel+1, level) PrintLevel(root.Right, currentLevel+1, level) } func Insert(root *Node, newNode *Node) *Node { if root == nil { return newNode } if newNode.Value \u0026lt;= root.Value { root.Left = Insert(root.Left, newNode) } else { root.Right = Insert(root.Right, newNode) } LH := MaxDepth(root.Left) RH := MaxDepth(root.Right) LHBalance := 0 RHBalance := 0 if root.Left != nil { LHBalance = MaxDepth(root.Left.Left) - MaxDepth(root.Left.Right) } if root.Right != nil { RHBalance = MaxDepth(root.Right.Left) - MaxDepth(root.Right.Right) } // Left hand side too heavy if (LH-RH) \u0026gt;= 2 \u0026amp;\u0026amp; LHBalance \u0026gt;= 0 { newRoot := root.Left root.Left = newRoot.Right newRoot.Right = root return newRoot } // Right hand side too heavy if (LH-RH) \u0026lt;= -2 \u0026amp;\u0026amp; RHBalance \u0026lt;= 0 { newRoot := root.Right root.Right = newRoot.Left newRoot.Left = root return newRoot } // Double rotation cases if (LH-RH) \u0026gt;= 2 \u0026amp;\u0026amp; LHBalance \u0026lt; 0 { newRoot := root.Left.Right root.Left.Right = nil newRoot.Left = root.Left root.Left = newRoot.Right newRoot.Right = root return newRoot } // Double rotation cases if (LH-RH) \u0026lt;= -2 \u0026amp;\u0026amp; RHBalance \u0026gt; 0 { newRoot := root.Right.Left root.Right.Left = nil newRoot.Right = root.Right root.Right = newRoot.Left newRoot.Left = root return newRoot } return root } func main() { aa := Node{Value: 30} bb := Node{Value: 20} cc := Node{Value: 10} dd := Node{Value: 15} ee := Node{Value: 17} ff := Node{Value: 18} zz := Insert(nil, \u0026amp;aa) zz = Insert(zz, \u0026amp;cc) zz = Insert(zz, \u0026amp;bb) zz = Insert(zz, \u0026amp;dd) for i := 1; i \u0026lt;= MaxDepth(zz); i++ { fmt.Printf(\u0026#34;Print level %v\\n\u0026#34;, i) PrintLevel(zz, 1, i) } fmt.Println(\u0026#34;Done\u0026#34;) zz = Insert(zz, \u0026amp;ee) zz = Insert(zz, \u0026amp;ff) InorderPrint(zz) fmt.Println(MaxDepth(zz)) for i := 1; i \u0026lt;= MaxDepth(zz); i++ { fmt.Printf(\u0026#34;Print level %v\\n\u0026#34;, i) PrintLevel(zz, 1, i) } } It\u0026rsquo;s a pretty interesting exercise and for sure, it can be extended way more in varied directions in order to understand the algorithm/data structure further. It\u0026rsquo;s still sad that I haven\u0026rsquo;t exactly found any exact place for when to actually use it so this is in the hopes for coming across such a situation in my day to day work.\n","date":"7 July 2022","externalUrl":null,"permalink":"/coding-out-self-balancing-tree-data-structures/","section":"Posts","summary":"Over the past month, I decided to go down the rabbit hole of exploring an example of a self balancing tree data structure. I generally don’t need to handle data structures on a day to day basis - I mostly deal with integration of tools as well as deployment of tools into a Kubernetes cluster. However, even if I don’t deal with that side of things, I do find that some of the thought process behind the data structures and algorithms are pretty interesting. (I’m still kind of waiting for a moment where I can actually utilize it in my work for real in a way)\n","title":"Coding out Self Balancing Tree data structures","type":"posts"},{"content":"Google sites now allow one to embed Javascript snippets into a site; thereby providing some interesting new capabilities with websites built with Google sites. The post here is a simple example of getting the same functionality provided in the BMI Calculator page.\nWe can copy the generated Javascript from the repository:\nhttps://github.com/hairizuanbinnoorazman/blog/blob/master/layouts/shortcodes/bmi_calculator.html https://github.com/hairizuanbinnoorazman/blog/blob/master/static/toolsjs/bmicalculator.min.js\nThe following Javascript is generated via Elm code into Javascript code snippets. The first link above is the HTML code that would call the required Javascript functionality. Over here, we skipped out the CSS bit - I personally find beautifying forms a tad unnecessary but to each their own. The second link is the actual Javascript that would provided the functionality that was coded out in Elm.\nTaking the above pieces of code snippets, we can come out with the following code snippet that could embedded into a Google site page:\n\u0026lt;div id=\u0026#34;bmi-calculator\u0026#34;\u0026gt;\u0026lt;/div\u0026gt; \u0026lt;link rel=\u0026#34;stylesheet\u0026#34; href=\u0026#34;https://www.hairizuan.com/css/bmi.css\u0026#34;\u0026gt; \u0026lt;script src=\u0026#34;https://www.hairizuan.com/toolsjs/bmicalculator.min.js\u0026#34;\u0026gt;\u0026lt;/script\u0026gt; \u0026lt;script\u0026gt; app = Elm.BMICalculator.init({ node: document.getElementById(\u0026#34;bmi-calculator\u0026#34;) }); \u0026lt;/script\u0026gt; Steps to embed code snippet # You can embed the code snippet by doing the following:\nFirst, we would need to create a Google Site. We would and then clicking to “embed” a new element on the page of the site.\nOn the Insert panel use the Embed option Choose the Embed code tab. After which, we can then type or paste our custom HTML and JavaScrip into the code box.\nUse the Next button to preview how your code will look\nUse the Insert button to add the code to the page. If there isn\u0026rsquo;t a preview of the expected html, that would mean that there might be an error in the javascript or html - you probably need to fix it. You can simply use the Edit code button (looks like a pencil) that overlays the middle of the preview and edit the code to correct the code so that it works fine.\nSome interesting points # An interesting point here is that rather than embedding the Javascript code that encapsulates the functionality of the BMI calculator, we can just pull it in via the \u0026lt;script\u0026gt; tag and pointing to a potential source that holds the javascript code. (An example would be this website; this website would hold links that have that piece of code snippet)\nAnother interesting point is that the embedded code can be interweaved between other content as well. Above the calculator, we can have some paragraph that could explain the details of what the tools is doing, providing context to the reader of what the tool does and how to intepret the outcome of the tool.\nAnother thing to note is that the following HTML, JS \u0026amp; CSS we embedded here is one that does not require to reach out to other server functionality. It\u0026rsquo;s simply does the calculation using full on javascript. In the future, I could write up a blog post that would showcase an example of how to create some html, js and css that would showcase the capability of some embedded code in Google Sites being able to interact with some external 3rd party API - but that\u0026rsquo;s a story for another time.\nThis opens up a variety of interesting use cases while using Google sites - technically, a non-developer can just embed any form of functionality that requires interacting with a custom API here (making it pretty extensible). This might prove pretty useful for \u0026ldquo;internal\u0026rdquo; work blogs (as an example use case)\n","date":"20 June 2022","externalUrl":null,"permalink":"/custom-js-snippets-in-google-sites/","section":"Posts","summary":"Google sites now allow one to embed Javascript snippets into a site; thereby providing some interesting new capabilities with websites built with Google sites. The post here is a simple example of getting the same functionality provided in the BMI Calculator page.\n","title":"Custom JS Snippets in Google Sites","type":"posts"},{"content":"","date":"20 June 2022","externalUrl":null,"permalink":"/categories/elm/","section":"Article Categories","summary":"","title":"Elm","type":"categories"},{"content":"","date":"20 June 2022","externalUrl":null,"permalink":"/tags/elm/","section":"Technology Tags","summary":"","title":"Elm","type":"tags"},{"content":"","date":"20 June 2022","externalUrl":null,"permalink":"/categories/google-sites/","section":"Article Categories","summary":"","title":"Google-Sites","type":"categories"},{"content":"","date":"20 June 2022","externalUrl":null,"permalink":"/tags/google-sites/","section":"Technology Tags","summary":"","title":"Google-Sites","type":"tags"},{"content":"While playing around with container technologies such as docker and kubernetes, one critical component that kind of comes up over and over again is the whole portion about managing network connections to the containers. If we are to just take an example of Kubernetes - the networking stack is handled by technologies that would interface with CNI as well kube proxy. In this post, we\u0026rsquo;ll be focusing on the linux feature that kube proxy kind of rely on (one of the modes that it runs on) which is IP Tables.\nIntroduction # According to wikipedia: \u0026ldquo;iptables is a user-space utility program that allows a system administrator to configure the IP packet filter rules of the Linux kernel firewall\u0026rdquo;.\nThere are multiple tables of concern with IPTables but we generally would only concern ourselves with 2 tables (NAT and FILTER). There are 5 different tables to manage but the rest of them are for more specific use cases. Refer to the following link for details https://wiki.archlinux.org/title/iptables:\nNAT FILTER RAW MANGLE SECURITY iptables -L -v Chain INPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination iptables -A INPUT -s \u0026#34;\u0026lt;IP ADDRESS\u0026gt;\u0026#34; -j DROP Refer to the following reference: https://web.mit.edu/rhel-doc/4/RH-DOCS/rhel-rg-en-4/s1-iptables-options.html\nBlocking access to nginx # iptables -A INPUT -p tcp --dport 80 -s X.X.X.X -j DROP Block port 80 for source ip X.X.X.X by dropping the network packets for it. Alternatively, we can set it to \u0026ldquo;reject\u0026rdquo; the packets\niptables -A INPUT -p tcp --dport 80 -s X.X.X.X -j REJECT Redirect port # iptables -t nat -A PREROUTING -p tcp --dport 8080 -j REDIRECT --to-port 80 From the following link: https://askubuntu.com/questions/444729/redirect-port-80-to-8080-and-make-it-work-on-local-machine\nPackets meant for loopback interface don\u0026rsquo;t exactly go through PREROUTING chain\nSo, from external, it would work, but from inside, not really.\nWe would need to add the following command to make it with localhost:\niptables -t nat -A OUTPUT -o lo -p tcp --dport 8080 -j REDIRECT --to-port 80 Cleanup # A default GCE VM instance doesn\u0026rsquo;t have any initial ruleset for IPTables (it\u0026rsquo;s more governed by Google\u0026rsquo;s networking stack - managed by adding networking tags to Virtual Machines)\niptables -F iptables -t nat -F References # The following links can be useful when handling IPTables:\nhttps://www.thegeekstuff.com/2011/06/iptables-rules-examples/\nAfter install docker # (Still researching - do not use as reference)\nInstalling docker\nsudo apt-get update sudo apt-get install -y \\ ca-certificates \\ curl \\ gnupg \\ lsb-release curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg echo \\ \u0026#34;deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian \\ $(lsb_release -cs) stable\u0026#34; | sudo tee /etc/apt/sources.list.d/docker.list \u0026gt; /dev/null sudo apt-get update sudo apt-get install -y docker-ce docker-ce-cli containerd.io iptables -nvL Chain INPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain FORWARD (policy DROP 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 0 0 DOCKER-USER all -- * * 0.0.0.0/0 0.0.0.0/0 0 0 DOCKER-ISOLATION-STAGE-1 all -- * * 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT all -- * docker0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED 0 0 DOCKER all -- * docker0 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT all -- docker0 docker0 0.0.0.0/0 0.0.0.0/0 Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain DOCKER (1 references) pkts bytes target prot opt in out source destination Chain DOCKER-ISOLATION-STAGE-1 (1 references) pkts bytes target prot opt in out source destination 0 0 DOCKER-ISOLATION-STAGE-2 all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0 0 0 RETURN all -- * * 0.0.0.0/0 0.0.0.0/0 Chain DOCKER-ISOLATION-STAGE-2 (1 references) pkts bytes target prot opt in out source destination 0 0 DROP all -- * docker0 0.0.0.0/0 0.0.0.0/0 0 0 RETURN all -- * * 0.0.0.0/0 0.0.0.0/0 Chain DOCKER-USER (1 references) pkts bytes target prot opt in out source destination 0 0 RETURN all -- * * 0.0.0.0/0 0.0.0.0/0 iptables -L -v -n -t nat Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 49 7688 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL Chain INPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 1 60 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0 Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 1 60 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL Chain DOCKER (2 references) pkts bytes target prot opt in out source destination 0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0 docker run -d -p 8080:80 --name=lol nginx Chain DOCKER (1 references) pkts bytes target prot opt in out source destination 0 0 ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.2 tcp dpt:80 Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 1 60 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0 0 0 MASQUERADE tcp -- * * 172.17.0.2 172.17.0.2 tcp dpt:80 ... Chain DOCKER (2 references) pkts bytes target prot opt in out source destination 0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0 0 0 DNAT tcp -- !docker0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:172.17.0.2:80 docker run -d --name=lol nginx iptables -A DOCKER ! -i docker0 -o docker0 -p tcp --dport 80 -s 0.0.0.0/0 -d 172.17.0.2 iptables -t nat -j DNAT -A DOCKER -p tcp ! -i docker0 --dport 8080 --to-destination 172.17.0.2:80 iptables -t nat -j MASQUERADE -A POSTROUTING -s 172.17.0.2 -d 172.17.0.2 -p tcp --dport 80 iptables -D DOCKER 1 iptables -t nat -D DOCKER 2 iptables -t nat -D POSTROUTING 2 iptables -t nat -I PREROUTING -p tcp --dport 8080 -j DNAT --to-destination 172.17.0.2:80 iptables -t nat -I POSTROUTING -p tcp -s 172.17.0.2 -j SNAT --to 10.128.0.56 iptables -I DOCKER ! -o docker0 -p tcp --dport 80 -s 0.0.0.0/0 -d 172.17.0.2 iptables -t nat -A OUTPUT -o lo -p tcp --dport 30000 -j DNAT --to-destination 172.17.0.2:80 ","date":"5 June 2022","externalUrl":null,"permalink":"/experimenting-with-ip-tables/","section":"Posts","summary":"While playing around with container technologies such as docker and kubernetes, one critical component that kind of comes up over and over again is the whole portion about managing network connections to the containers. If we are to just take an example of Kubernetes - the networking stack is handled by technologies that would interface with CNI as well kube proxy. In this post, we’ll be focusing on the linux feature that kube proxy kind of rely on (one of the modes that it runs on) which is IP Tables.\n","title":"Experimenting with IP Tables","type":"posts"},{"content":"There is an old adage from security land that we should restrict access to resources/assets as much as we can. Users and applications should only access items that they need to operate themselves. Following this line of thought, that would mean that if we are to deploy application in a Kubernetes Cluster, we should ensure that pods should only accept communication that they\u0026rsquo;ve explicitly declared as \u0026ldquo;required\u0026rdquo;. Is there a way to do so?\nWell, naturally, since this blog post will be covering it; that would mean that there is a way to do so in Kubernetes. However, one thing to know is that in the past, some people would have done it via Service Meshes (you can refer to projects like Istio for examples of this). This functionality is highlighted front and centre in terms of the product pages of such service meshes (but of course, other functionality is just as important - e.g. Circuit Breaking, Rate limiting etc)\nHowever, let\u0026rsquo;s say, we have the security requirement of restricting network traffic between pods (pods that require such communication will require explicit declaration) but we don\u0026rsquo;t want to take on the huge dependency of running a service mesh in our cluster. How can this be done?\nThere is a Kind called NetworkPolicy. We can demonstrate this with the following example:\nSetting up a Cluster, Deployments and Services # First step is to get ourselves a Kubernetes Cluster; maybe a Google Kubernetes Engine cluster? The more important bit is that we would set the Google Kubernetes Engine with NetworkPolicy enforcemenet enabled. In the Google Cloud Console (UI) at the time of writing, we can find that option under \u0026ldquo;Networking\u0026rdquo;. There are 5 parts of the cluster that we configure for GKE which is Automation, Networking, Security, Metadata and Features. We would then want to run the following set of commands:\n# Create a deployment with nginx image to be run in default namespace kubectl create deployment lol-default --image=nginx # Create a new namespace called yahoo kubectl create namespace yahoo # Create multiple deployments in yahoo namespace kubectl create deployment lol-yahoo --image=nginx -n yahoo kubectl create deployment miao-yahoo --image=nginx -n yahoo At the end of this, we would have 3 pods in 2 namespaces:\n# Get pods from default namespace NAME READY STATUS RESTARTS AGE lol-default-5db5d6874f-prknx 1/1 Running 0 73m # Get pods from yahoo namespace NAME READY STATUS RESTARTS AGE lol-yahoo-59d5c4d954-vkb7l 1/1 Running 0 70m miao-yahoo-b68745745-tvqh2 1/1 Running 0 64m Let\u0026rsquo;s the have the pods for lol-default deployment be exposed via a service.\nkubectl expose deployment lol-default --port=80 If we now go into lol-yahoo pods and try to query for the lol-default pods via service, we would be allowed to do so:\n# Format: kubectl exec -it \u0026lt;pod-name\u0026gt; -n yahoo -- /bin/bash kubectl exec -it lol-yahoo-59d5c4d954-vkb7l -n yahoo -- /bin/bash Within the container:\ncurl lol-default.default.svc It should return the following:\n\u0026lt;!DOCTYPE html\u0026gt; \u0026lt;html\u0026gt; \u0026lt;head\u0026gt; \u0026lt;title\u0026gt;Welcome to nginx!\u0026lt;/title\u0026gt; \u0026lt;style\u0026gt; html { color-scheme: light dark; } body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } \u0026lt;/style\u0026gt; \u0026lt;/head\u0026gt; \u0026lt;body\u0026gt; \u0026lt;h1\u0026gt;Welcome to nginx!\u0026lt;/h1\u0026gt; \u0026lt;p\u0026gt;If you see this page, the nginx web server is successfully installed and working. Further configuration is required.\u0026lt;/p\u0026gt; \u0026lt;p\u0026gt;For online documentation and support please refer to \u0026lt;a href=\u0026#34;http://nginx.org/\u0026#34;\u0026gt;nginx.org\u0026lt;/a\u0026gt;.\u0026lt;br/\u0026gt; Commercial support is available at \u0026lt;a href=\u0026#34;http://nginx.com/\u0026#34;\u0026gt;nginx.com\u0026lt;/a\u0026gt;.\u0026lt;/p\u0026gt; \u0026lt;p\u0026gt;\u0026lt;em\u0026gt;Thank you for using nginx.\u0026lt;/em\u0026gt;\u0026lt;/p\u0026gt; \u0026lt;/body\u0026gt; \u0026lt;/html\u0026gt; Restricting the connections with NetworkPolicy # Let\u0026rsquo;s try locking it down now - first step is to set an initial NetworkPolicy rule to deny all ingress for all pods within the default namespace\napiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-ingress namespace: default spec: podSelector: {} policyTypes: - Ingress Note the the empty podSelector. This would indicate that it should be apply to all pods within the namespace that this NetworkPolicy is in. Also, since there is no \u0026ldquo;Ingress\u0026rdquo; rule provided within this NetworkPolicy, we would be denying all ingress connections (although egress is ok).\nIf we try to curl for lol-default.default.svc from the lol-yahoo pod, we would not be able to connect properly. The command will just hang as connection is rejected.\nLet\u0026rsquo;s say we would want to set up such that only lol-yahoo can connect to lol-default but not miao-yahoo. How can we set such a configuration up?\napiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-lol-yahoo-lol-default namespace: default spec: podSelector: matchLabels: app: lol-default policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: yahoo podSelector: matchLabels: app: lol-yahoo Let\u0026rsquo;s go through step by step what this NetworkPolicy kind of means here:\nNetworkPolicy is applied to default namespace -\u0026gt; so it will potentially pods within this namespace spec.PodSelector is not empty. It is applied to only pods that have the labels app: lol-default. So, if there are other pods in the default namespace, they will still affected by our default NetworkPolicy of denying all ingress traffic. We declared spec.Ingress this time. Rules defined here are \u0026ldquo;allow\u0026rdquo; rules -\u0026gt; essentially, we\u0026rsquo;re saying that pods or traffic from certain ip address are allowed to reach into pod. spec.Ingress.from[0].namespaceSelector was defined. If this was empty but podSelector was filled, it would mean that the rule is applied to default namespace since no namespace selector was passed to that entry. For more details on this, it\u0026rsquo;s best to refer to Kubernetes documentation: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#networkpolicypeer-v1-networking-k8s-io With the above NetworkPolicy in place, this should allow us to run curl from lol-yahoo pod but not from miao-yahoo pod. This is lol-yahoo pod is the pod that has the labels of app: lol-yahoo and comes from yahoo namespace.\nRefer to the following page as well for more details on NetworkPolicy and its usage: https://kubernetes.io/docs/concepts/services-networking/network-policies/\nConclusion # NetworkPolicy is a little interesting exercise on how to limit traffic between pods; however, it comes with its set of limitations (refer to the link above). A lot of limitations seem to point that users that require such features should go for service mesh instead - which makes it seem that maybe, depending on one\u0026rsquo;s requirement, one should explore service meshes such as Istio or Gloo etc.\nA random thought that come up would be that the implementation of NetworkPolicy or any other technology that requires application developers to explicitly set which application can access it is additional administrative load on developers. I doubt that would be an automated way to do so - but I do look forward if there is any interesting development in this space.\n","date":"15 May 2022","externalUrl":null,"permalink":"/restricting-connections-between-pods-in-a-kubernetes-cluster-network-policy/","section":"Posts","summary":"There is an old adage from security land that we should restrict access to resources/assets as much as we can. Users and applications should only access items that they need to operate themselves. Following this line of thought, that would mean that if we are to deploy application in a Kubernetes Cluster, we should ensure that pods should only accept communication that they’ve explicitly declared as “required”. Is there a way to do so?\n","title":"Restricting connections between pods in a Kubernetes Cluster (Network Policy)","type":"posts"},{"content":"This blog post is kind of a blog post that provide some notes of some experimentation that I encountered while playing with Google Cloud Platform. The purpose of this experimentation was to do the following:\nHave 1 instance on VPC A The following instance will be in a private VPC Nginx will be installed on this instance - so it would require internet access Have 1 other instance on VPC B Instance is able to hit the instance from VPC A on DNS address (or a friendly name and not via a IP Address) Both instances are able to talk to each other Creating another VPC for testing # In a normal Google Cloud Project, it would only contain only the \u0026ldquo;default\u0026rdquo; VPC. If we are to experiment the whole VPC Peering and testing connection between 2 VPCs, we would need 2 VPCs - thereby, it means that we would need to create another VPC.\nAn important thing to note here is that if we wish for VPC Peering to perform correctly in later stages, we would need to ensure that the IP Addresses are different for the VPCs we\u0026rsquo;re trying to connect together.\nE.g. Let\u0026rsquo;s say we\u0026rsquo;re trying to US-central1 from both VPCs, and if we had used \u0026ldquo;automatic\u0026rdquo; during subnet creation mode while trying create another VPC - it would use the same CIDR. That would result VPC Peering to fail. Best to set up a custom setup.\nWhat I\u0026rsquo;ve tried here was to use the following parameters:\nVPC Name: testing Custom Subnet 1 Name: us-central1 Region: us-central1 CIDR: 10.129.0.0/20 Ensure that firewall rules such as those that allow for SSH is created as well. The firewall should be similar to the one defined in \u0026ldquo;default\u0026rdquo; VPC which is to allow from all IPs (0.0.0.0/0) - to reduce the hassle during experimentation.\nCreating instances with no external IPs # I suppose one way to reduce attack surface is to ensure that only instances that need to be exposed to the internet will have external IPs. Otherwise, they should not be assigned an external IP - so they can\u0026rsquo;t exactly be \u0026ldquo;accessed\u0026rdquo; from the internet.\nThis is done during instance creation page:\nScroll down to \u0026ldquo;additional options\u0026rdquo; of \u0026ldquo;NETWORKING, DISKS, SECURITY, MANAGEMENT and SOLE-TENANCY\u0026rdquo; Scroll downwards to network dropdown tab Edit Network interfaces For 1 instance - Choose default VPC For the other - Choose testing VPC Both in US-central1 External IP - set to None for both cases No network tag is needed here in a sense After creating the instance, we can try to SSH from the Google Cloud Console UI page and try to run some simple command to reach out to the internet; we would realize that we can\u0026rsquo;t connect to the internet (this is to be expected). We can do this by running the below command:\nsudo apt update The command will hang when it attempts to reach to an external address and realizes it can\u0026rsquo;t resolve nor route to it.\nGiving instance in default VPC internet accesss # We would only want to experiment where instance in the \u0026ldquo;default\u0026rdquo; VPC would have nginx installed. To do this, we would need internet access in the following VPC for those instances with no external IP Address.\nTo do so, we would need to setup a Cloud NAT. The setup of Cloud NAT is simple via UI on Google Cloud Console. While creating the Cloud NAT, it would require us to provide some sort of Cloud Router; we can just create a new Cloud Router from the same page.\nMost of the parameters during creation of Cloud NAT and Cloud Router is just the name of the Cloud NAT and Cloud Router. Names should be sensible so that it would make sense from reports.\nWe can only provide internet to one subnet group at one time. For each of the other subnet groups, we would need to create one NAT for each of them. (Might be troublesome operationally)\nTo test, we can go back and ssh the instance that is in \u0026ldquo;default\u0026rdquo; VPC and run the command:\nsudo apt update \u0026amp;\u0026amp; sudo apt install -y nginx This time, it would work as expected (assuming NAT was setup successfully)\nVPC Peering # We would need to set up 2 VPC Peerings links between the 2 VPCs.\nConnection from default VPC to testing VPC Connection from testing VPC to default VPC If one of those is missing, the VPC Peering will be set to inactive state\nTo test VPC Peering is working is successfully, we can just attempt VPCs from across another VPC.\nE.g. If default VPC has an instance with IP Address 10.128.0.35, we can run the ping command from the instance in the testing VPC\nping 10.128.0.35 If the setup is successful, we would get the following result:\nPING 10.128.0.35 (10.128.0.35) 56(84) bytes of data. 64 bytes from 10.128.0.35: icmp_seq=1 ttl=64 time=1.86 ms 64 bytes from 10.128.0.35: icmp_seq=2 ttl=64 time=0.299 ms Note, that we cannot refer instances by name across VPCs.\nE.g. If instance in \u0026ldquo;default\u0026rdquo; VPC is called \u0026ldquo;instance-1\u0026rdquo;, it can generally be referred by name by pinging instance-1. However, DNS names apparently is not resolved across VPC Peering. There is a question on some forum about it here: https://serverfault.com/questions/1005112/gcp-how-to-do-dns-peering-between-2-vpcs-that-use-vpc-peering-in-the-same-proje\nThis would make things slightly brittle if we are required to refer to other instances over in the other VPC via IP Addresses.\nProviding DNS across VPCs # I\u0026rsquo;m not sure if this is the right way to things, but this is definitely one lazy way out for this issue. We can rely on Cloud DNS product where we can register private DNS entries and then have it exposed on both VPCs. We can do this whole registering of new instances and associate the address of the new instance and its DNS name by calling some gcloud command: https://cloud.google.com/sdk/gcloud/reference/dns/record-sets/create. This is done by creating a \u0026ldquo;A\u0026rdquo; record that would refer to the instance correspondingly, and then map it to a relevant domain name.\nE.g. let\u0026rsquo;s say our DNS Zone that we created in Cloud DNS uses the overall domain of \u0026ldquo;example.com\u0026rdquo;. We can register a \u0026ldquo;A\u0026rdquo; record which maps \u0026ldquo;hoho.example.com\u0026rdquo; that maps an IP address of 10.128.0.35 to that domain. If we try to ping it from either VPC, it should work:\nPING hoho.example.com (10.128.0.35) 56(84) bytes of data. 64 bytes from 10.128.0.35 (10.128.0.35): icmp_seq=1 ttl=64 time=1.36 ms 64 bytes from 10.128.0.35 (10.128.0.35): icmp_seq=2 ttl=64 time=0.395 ms Seeing that we\u0026rsquo;ve already installed nginx in the instance in the \u0026ldquo;default\u0026rdquo; VPC, we can run a curl command from the instance in the \u0026ldquo;testing\u0026rdquo; VPC in it for the domain \u0026ldquo;hoho.example.com\u0026rdquo;\n# From instance in \u0026#34;testing\u0026#34; VPC curl hoho.example.com It should return the following output - essentially the standard nginx output:\n\u0026lt;!DOCTYPE html\u0026gt; \u0026lt;html\u0026gt; \u0026lt;head\u0026gt; \u0026lt;title\u0026gt;Welcome to nginx!\u0026lt;/title\u0026gt; \u0026lt;style\u0026gt; body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } \u0026lt;/style\u0026gt; \u0026lt;/head\u0026gt; \u0026lt;body\u0026gt; \u0026lt;h1\u0026gt;Welcome to nginx!\u0026lt;/h1\u0026gt; \u0026lt;p\u0026gt;If you see this page, the nginx web server is successfully installed and working. Further configuration is required.\u0026lt;/p\u0026gt; \u0026lt;p\u0026gt;For online documentation and support please refer to \u0026lt;a href=\u0026#34;http://nginx.org/\u0026#34;\u0026gt;nginx.org\u0026lt;/a\u0026gt;.\u0026lt;br/\u0026gt; Commercial support is available at \u0026lt;a href=\u0026#34;http://nginx.com/\u0026#34;\u0026gt;nginx.com\u0026lt;/a\u0026gt;.\u0026lt;/p\u0026gt; \u0026lt;p\u0026gt;\u0026lt;em\u0026gt;Thank you for using nginx.\u0026lt;/em\u0026gt;\u0026lt;/p\u0026gt; \u0026lt;/body\u0026gt; \u0026lt;/html\u0026gt; ","date":"5 May 2022","externalUrl":null,"permalink":"/private-vpc-experimentation/","section":"Posts","summary":"This blog post is kind of a blog post that provide some notes of some experimentation that I encountered while playing with Google Cloud Platform. The purpose of this experimentation was to do the following:\n","title":"Private VPC Experimentation","type":"posts"},{"content":"There is a trend of images that follow the philosophy of minimizing the size of image by removing almost everything out of image. This helps with getting image downloaded more quickly by kubelet into the nodes as well as possibly reducing the attack surface of the container even further (I suppose it\u0026rsquo;s harder to do things in a container if utilities like shell or bash don\u0026rsquo;t exist within it). You would probably see errors such as this for those containers that have somewhat remove the shell/bash:\nerror: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec \u0026#34;cc558cb1b205490e0f5b604c06d542ea997748485ab1c869d97240e8b8792d77\u0026#34;: OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: \u0026#34;/bin/bash\u0026#34;: stat /bin/bash: no such file or directory: unknown How do we get such a container? Let\u0026rsquo;s go step by step and go from creating such a golang application, build a docker image for it and then running it in the cluster.\nImportant note here is that the following files are for Golang 1.14. Apparently, later versions of Golang require certain modules files etc to be in place.\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;log\u0026#34; \u0026#34;net/http\u0026#34; ) func main() { port := 8080 http.HandleFunc(\u0026#34;/\u0026#34;, helloWorldHandler) log.Printf(\u0026#34;Server starting on port %v\\n\u0026#34;, port) log.Fatal(http.ListenAndServe(fmt.Sprintf(\u0026#34;:%v\u0026#34;, port), nil)) } func helloWorldHandler(w http.ResponseWriter, r *http.Request) { log.Println(\u0026#34;serving\u0026#34;, r.URL) fmt.Fprint(w, \u0026#34;This is a test. Hello World Miaoza!!\\n\u0026#34;) } Save the following file in main.go in a folder. This is just a simple Golang application that has one single route. You can run it locally with golang run main.go and then, use curl/wget to get the responses of it.\nNext would be the docker image; we would be using an image that starts with distroless. Distroless images are images that have characteristics that were mentioned in the top part of the blog: removal of as much of the container as possible to reduce the image size as well as attack surface. You can refer to the project here: https://github.com/GoogleContainerTools/distroless\nLet\u0026rsquo;s have the following Dockerfile to build our docker image:\nFROM golang:1.14 as build WORKDIR /app ADD . . RUN CGO_ENABLED=0 go build -o app . FROM gcr.io/distroless/base-debian11:nonroot COPY --from=build /app/app /app EXPOSE 8080 CMD [\u0026#34;/app\u0026#34;] As mentioned, this uses golang:1.14 docker image to build the app. The app binary is then copied over to a debian \u0026ldquo;nonroot\u0026rdquo; distroless container. Let\u0026rsquo;s save the file in Distroless.Dockerfile\nWe can build the dockerfile and the run the image generated from it using the following commands:\ndocker build -t testing -f Distroless.Dockerfile . docker run -p 8080:8080 --name testing testing The first line in the above command is to be build the docker image. The build docker image will be tagged with the name \u0026ldquo;testing\u0026rdquo;. We would then use that built image \u0026ldquo;testing\u0026rdquo; and run it - not forgetting to map our host machine\u0026rsquo;s port 8080 to the container\u0026rsquo;s port 8080. To test that the application works of the docker image, we can just run curl against it\ncurl localhost:8080 That should return the following response:\nThis is a test. Hello World Miaoza!! Normally, if us as developers would like to inspect what is going on within the image, we would want to try to run the shell command and then inspect the files within it etc. If we tried to run a command that to do so:\ndocker exec -it testing /bin/bash We would see this error instead:\nOCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: \u0026#34;/bin/bash\u0026#34;: stat /bin/bash: no such file or directory: unknown This is as expected from a container built with a distroless base. We would expect such utility capabilities to not be available. For debugging purposes, it might be better to not rely on distroless but instead, use a plain old debian image - that would allow us to debug more easily locally.\nHowever, in the case where we would need to debug it in a production setting? E.g. Engineering management mandating that every team in the company utilizes distroless base image. How do we debug this on production? Would it be possible?\nLet\u0026rsquo;s try to demonstrate this with this image on a Google Kubernetes Engine cluster.\nFirst step would be to push the built image to Google Container Registry. We can do so by retagging the \u0026ldquo;testing\u0026rdquo; image with the appropiate tag as follows:\ndocker tag testing gcr.io/\u0026lt;project id\u0026gt;/distroless-hello-world:v1 We can then push the image into Google Container Registry (assuming that you have already done all the steps to authorize your workstation to push it automatically there)\ndocker push gcr.io/\u0026lt;project id\u0026gt;/distroless-hello-world:v1 The next step would be to have a yaml file that would contain the deployment kubernetes manifest to get our application into production. We would apply the following manifest file by running the kubectl apply command as follows.\n# Assuming that the below file is called \u0026#34;secure.yaml\u0026#34; kubectl apply -f secure.yaml apiVersion: apps/v1 kind: Deployment metadata: name: distroless-helloworld-1 labels: run: helloworld-1 spec: replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: run: helloworld-1 strategy: type: RollingUpdate template: metadata: labels: run: helloworld-1 spec: securityContext: # https://kubesec.io/basics/containers-securitycontext-runasuser/ runAsUser: 20000 runAsGroup: 20000 fsGroup: 20000 containers: - image: gcr.io/\u0026lt;project id\u0026gt;/distroless-hello-world:v1 name: helloworld ports: - containerPort: 8080 securityContext: allowPrivilegeEscalation: false privileged: false runAsNonRoot: true readOnlyRootFilesystem: true capabilities: drop: - all restartPolicy: Always The following deployment yaml file would deploy such a pod that utilizes the above built image and start it up in pod. For extra good measure, we added additional security options that normal basic web applications should respect such as not running in privileged mode, not running as root and not requiring any special linux kernal capabilities. We can then check that the pod is running by doing the following:\nkubectl get pods This would get all the pods on the clusters at the moment; which if you run this on a \u0026ldquo;fresh\u0026rdquo; GKE instance, it would show the following:\nNAME READY STATUS RESTARTS AGE distroless-helloworld-1-5d8dd7f664-xsvl2 1/1 Running 0 3m27s If we wish to check that the application still works, we can run a port forward to make sure that application is still working and serving the right traffic.\n# Example format # kubectl port-forward \u0026lt;pod name\u0026gt; 8080:8080 kubectl port-forward distroless-helloworld-1-5d8dd7f664-xsvl2 8080:8080 We can run curl against localhost:8080 to check that the application is still serving traffic as expected.\nHowever, let\u0026rsquo;s say we go into the situation where we would need to check the files of our application container? Can we run the some sort of \u0026ldquo;shell\u0026rdquo; to check for that? If we tried to do so here:\nkubectl exec -it distroless-helloworld-1-5d8dd7f664-xsvl2 -- /bin/bash We would have the following error (as expected):\nerror: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec \u0026#34;3929950fd0d4be8c20b2e4efd3db1693b59d665750954f2260b66bbc766d32f4\u0026#34;: OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: \u0026#34;/bin/bash\u0026#34;: stat /bin/bash: no such file or directory: unknown It is somewhat similar to the error message at the top of this post. So, is there a mechanism that would allow us to do this sort of check?\nOne method right now is to run a kubectl debug command (or a variant of it). There is still plenty of development around surrounding this component so, its usefulness is still not maximized but in some cases like this, it\u0026rsquo;s good enough. If we run the debug statement as follows:\nkubectl debug distroless-helloworld-1-5d8dd7f664-xsvl2 -it --image=ubuntu --share-processes --copy-to=debugging-pod This step literally creates a new pod with a ubuntu sidecar as well as a copy of the application that we\u0026rsquo;re trying to debug (it is not the same application but a copy). We can then run the following command:\nps ax This would list all processes in the whole pod (note the additional flag of share-processes that allow us to see processes in the other container in the pod)\nPID TTY STAT TIME COMMAND 1 ? Ss 0:00 /pause 7 ? Ssl 0:00 /app 16 pts/0 Ss 0:00 bash 25 pts/0 R+ 0:00 ps ax From what we know, the /app is the process that our main \u0026ldquo;app\u0026rdquo; docker image is running. We can continue debugging by running curl commands locally or running other checks against the other container. Or we can even check the files on the other container. This can be done by the following:\n# Format: # cd /proc/\u0026lt;process id of /app\u0026gt;/root cd /proc/7/root That will put us in the file system of the container that is running the /app command. This would useful to kind of inspect possibly rendering of configuration files or seeing how the application responds to live traffic and how it manipulates the file system.\nThe above is a tiny exercise of how Kubernetes continues to be improved to make it easier to debug applications. Unfortunately, the debug subcommand still has issues here and there (you can\u0026rsquo;t debug an actual \u0026ldquo;live\u0026rdquo; application by maybe creating a temporary image alongside the live container?). The functionality is still under development work (possible to use but it seems certain flags need to be turned on? Or it could be I misunderstand if that\u0026rsquo;s the functionality being offered)\nThe following source code is also available in the following Github repo as well: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicWeb\n","date":"15 April 2022","externalUrl":null,"permalink":"/debugging-distroless-kubernetes-pods/","section":"Posts","summary":"There is a trend of images that follow the philosophy of minimizing the size of image by removing almost everything out of image. This helps with getting image downloaded more quickly by kubelet into the nodes as well as possibly reducing the attack surface of the container even further (I suppose it’s harder to do things in a container if utilities like shell or bash don’t exist within it). You would probably see errors such as this for those containers that have somewhat remove the shell/bash:\n","title":"Debugging Distroless Kubernetes Pods","type":"posts"},{"content":"Before reading on, I\u0026rsquo;d just give a disclaimer here that any experience recounted here does not count as medical advice. As with anything to do with health or your body, do seek the appropiate medical channels (your doctor etc). Whatever things that is mentioned here might work for my case but it could be completely diferent for each person\u0026rsquo;s case due to different past diets/medical history.\nPrologue # About 1-2 years back (I guess just before covid really hit the world), my weight started to balloon to a scary degree; it had always been consistently high and I\u0026rsquo;ve tried multiple approaches to try to reduce it. One way is to try to stick to \u0026ldquo;healthier\u0026rdquo; foods with more greens (e.g. Subwway) and cutting back on carbs such as rice and bread. Even when going to those ZiChar stalls etc, the main dish I would usually eat is vegetables and maybe half a plate of rice (with a small bit of fish/chicken/seafood). However, my weight never really went down, it only did a slow but continuous rise up.\nI guess one reason for this is I probably didn\u0026rsquo;t try to exercise or move around as much as I could. I only reduced my food intake but at the same time, I didn\u0026rsquo;t increase the amount of exercise I did. So maybe, if I had moved around a lot more, maybe I could have lost some weight here and there. But then again, I do think that its kind of unmotivating to see that for maybe a small-ish food item (maybe a small piece of chicken), to burn it off, you have a work a heck of a lot more harder. One example is maybe a chicken breast (which is already a healthier food) has about 165 calories (based on a quick search). But to \u0026ldquo;burn it off\u0026rdquo; to ensure that calorie intake is equal to calorie consumed, one would need to run maybe 20-30 minutes (a 30 minute run can use 200-500 calories). Also, all these measurements are all iffy in nature; they\u0026rsquo;re obtained from a quick search on the web and I don\u0026rsquo;t have time to verify and check if the values are really true.\nHowever, regardless whether if the above numbers are true or not, it\u0026rsquo;s not exactly a pleasing thought to always have something at the back of head doing mental calculations on whether to eat a piece of food or not and whether I would need to spend all that extra time to burn it off.\nAlso, that\u0026rsquo;s not the end of it. Articles starting popping up about how if one tries to reduce the calorie intake, the body can kind of compensate it by reducing the body base metabolism. There were a bunch of studies being done on weight loss participants. These participants go through heavy diet changes and extreme exercise regimens and I would assume that the only way of them to keep that weight it to continue with said changes through the rest of their life. However, that does sound like a pretty miserable journey (especially so for my case since I don\u0026rsquo;t exactly like exercising in humid Singapore too much)\nArticles mentioning about weight gain by weight loss participants:\nhttps://www.nytimes.com/2016/05/02/health/biggest-loser-weight-loss.html\nI guess all of this are just excuses at the end of the day but during that whole time before hitting my \u0026ldquo;peak\u0026rdquo; weight, I do wonder if there is an easier way to go about with this \u0026ldquo;weight loss\u0026rdquo; without going through that mental math of keeping tabs of calorie input and output as well as execising constantly.\nAlso, just putting out there; my career was kind of \u0026ldquo;stabilizing\u0026rdquo; so I thought that maybe now is actually a good time to work on my health. I\u0026rsquo;ve kept putting it aside to focus on my career but I guess with me being somewhat satisfied being where I am at my job as well in my community activities in the tech space; I just thought to myself: Why not now?\nStep 0: Prepping for weight loss process # As with the old adage \u0026ldquo;you can’t manage what you can’t measure\u0026rdquo; that roams around management thinking; with that line of thought, in order for me to manage my own weight, I would need to measure and continuously monitor it. I guess with that, the first step would be to actually get/buy a weighing machine. The weight needs to be recorded on a day to day basis\nOne thing I\u0026rsquo;ve learnt at least during the whole constant measurement of my weight is that a person\u0026rsquo;s weight varies across the entire the day. Consumption of food/drinks will cause increase in weight while any \u0026ldquo;toilet activities\u0026rdquo; would kind of result in a reduction of a weight. It is kind of vital to select a standard timing to measure weight which in this, I\u0026rsquo;ve decided to measure it right after waking up on the day itself (regardless if its in the morning or afternoon after I\u0026rsquo;ve slept in). In that way, weight is somewhat \u0026ldquo;consistent\u0026rdquo; in a way other variables like consumption of food/drinks would not affect it too much.\nI guess some would say that weight of clothes does affect the measurement but I do think it\u0026rsquo;s kind of hassle to keep taking off clothes just before measurement especially in the morning right after I wake up.\nExperiment 1: Cut breakfast # I started watching a whole bunch of youtube videos that kind of questions the importance of breakfast - which I can kind of agree on. Why is breakfast \u0026ldquo;the most important meal\u0026rdquo; of the day? Why do we need 3 meals a day? Does breakfast being \u0026ldquo;the most important meal\u0026rdquo; mean that it has to be rich with calories/nutrients?\nThis was one of the videos that started that whole thing:\nThinking about it, by skipping one meal - that would kind of reduce the amount of calories being consumed for the day. That is kind of convenient thing for me when I\u0026rsquo;m trying to lose weight. Rather than trying to reduce amount of food per meal, it would actually be way easier (accounting-wise) to just cut a meal and eat as normally for lunch and dinner.\nWith this somewhat minor change, my weight kind of dropped from peak weight of 130kg to 110kg over 6 months.\nExperiment 2: Intermittent fasting # Weight loss from the previous weight loss \u0026ldquo;experiment\u0026rdquo; started not to have much impact anymore. My weight start fluctuating up and down; there is no further downtrend that can be observed. It\u0026rsquo;s as though the body has already determined the weight it wants to abide to and sticks to it as much as possible.\nThis was when videos on intermittent fasting starting coming out (even on mainstream Singapore media - CNA)\nThe only difference between what I was already doing and intermittent fasting is that I still drink liquids that contains calories (at least in the cases where I\u0026rsquo;m drinking milk coffee - there is no sugar in it) in the morning or late in the evening. My meals are already on the intermittent fasting time period. The intermittent fasting I\u0026rsquo;m referring here is the 16-8 combination where one undergo 16 hours of fasting (only can drink liquids with 0 calories) and can only eat within the 8 hour window.\nSeeing the minor difference in current diet as well as the videos that mention potential health benefits of intermittent fasting, I\u0026rsquo;ve thought to myself: \u0026ldquo;why not give it a shot?\u0026rdquo;. And with that, I made my transition to try that diet.\nAs expected, similar to experiment 1 of just cutting breakfast, initial weight loss rate is high at first. However, weight loss eventually plateaus. I still can try to push weight down using this approach but the weight loss rate is sometimes not high enough to warrant the amount of effort to do so. What would work here would be to actually eat healthily and to begin some sort of exercise regimen to continue the weight loss but it does seem like \u0026ldquo;too much effort\u0026rdquo; to continue\nThis approach resulted in a weight loss of 110kg to 95kg over the course of 8 months\nExperiment 3: Long term fasting # Similar to the situation in the previous phases, my weight started to stagnate and fluctuate up and down for quite a while. I\u0026rsquo;d imagine that the rate of weight loss at this stage based on my previous phase\u0026rsquo;s approach would eventually lead to weight loss but its just really really slow - my guess is that with a lighter body, my energy requirements (also calorie requirement) went down as well. I guess one way is to actually go the miserable route and cut calories and finally start moving alot more?\nUnless\u0026hellip;\nVideo on a dude\u0026rsquo;s experience on a 5 day fast I came across the following video by accident and now, I started going down the rabbit hole of why he did it (even though he\u0026rsquo;s kind of already in shape etc). And then, I came across the following videos as well.\nVideo on Fasting benefits Video on How to slow aging on Veritasium Youtube Channel\nThere were a lot of more videos on this topic that expected. A lot of them were from \u0026ldquo;nutritional experts\u0026rdquo; or \u0026ldquo;health experts\u0026rdquo; that would explain to do \u0026ldquo;ACTIVITY X\u0026rdquo; or eat \u0026ldquo;ITEM X\u0026rdquo; and it will heal you and stuff but that never appeal to me. The one that is a lot of interesting is the ones where the commentary is covered by Doctors (at least they proclaim to be on video). They claim stuff but they would also inject in some scientific information that kind of coincides with how I learnt when I was still in University.\nLet me list down some points here:\nGlucose enters the body after digestion from starches etc (Broken down from starch, sucrose etc) Too much glucose in the blood stream is bad, so the body produces insulin as a response to it Insulin response informs cells to take up glucose to build up chains of glycogen Any further excess of glucose is converted to fat (Didn\u0026rsquo;t exactly learnt how in school). According to some random article I read online, apparent high insulin in the blood will tell adipose fat cells to take up glucose for storage - https://www.nih.gov/news-events/news-releases/nih-study-shows-how-insulin-stimulates-fat-cells-take-glucose I guess in computer terms, I\u0026rsquo;d imagine glycogen as some sort of energy \u0026ldquo;cache\u0026rdquo; and fat as some sort of energy \u0026ldquo;disk\u0026rdquo; High insulin levels prevents energy usage from fat cells; as long as insulin levels remain high, fat remains locked up. (I guess with this, the caloric model of handling weight loss is kind of broken-ish) Hunger is just a hormonal response and can be handled/controlled. Being hungry doesn\u0026rsquo;t exactly mean one is \u0026ldquo;out of energy\u0026rdquo;; it sometimes just mean that that\u0026rsquo;s the usual time to eat or maybe blood sugar dipped slightly below usual levels (before glycogen/fat stores) unlocked This is partially why intermittent fasting/long term fasting kind of work when it comes to losing weight - it reduces/forces insulin levels to be low and with that, we can tap into the vast energy stores in the fat cells. I tried a couple of times of doing the long term fasts and experience usual goes as follows:\nDay 1: Feel great - no issue with hunger (since I\u0026rsquo;m already intermittent fasting anyway) Day 2: Hunger pangs start to come in; maybe have slight headache (could be from fuel switch for brain from glucose to ketones? maybe?) Day 3-5: Feel ok during the day but sometimes, hard to sleep. Caffeine is way more effective in the fasted stated state that the fed state Naturally, the longer the fast, the faster the weight tanks (still a goal till this day). I\u0026rsquo;m close to my healthy BMI range and I just need to continue on this journey for a couple of months before finally just maintaining my weight and food intake. Apparently, one thing I\u0026rsquo;ve found out was that for Asian males; the BMI to target for is 23.0, which is really quite low - I\u0026rsquo;ve previously thought it was 25.0 but apparently, further studies pointed that it is better for asians to aim for lower BMI due to different body builds. If you wish to calculate your BMI, you can try using this mini tool that I\u0026rsquo;ve built for myself: BMI Calculator\nAnother effect that kind of came along with these experiments is that I actually prefer to be in a \u0026ldquo;hungry\u0026rdquo; state rather than a \u0026ldquo;fed\u0026rdquo; state. While being in the \u0026ldquo;hungry\u0026rdquo; state, I\u0026rsquo;d feel more active and my mind is sharper and I feel way more motivated as compared to the \u0026ldquo;fed\u0026rdquo; state. In the \u0026ldquo;fed\u0026rdquo; state - it always felt like as though I\u0026rsquo;m in sort of a \u0026ldquo;food coma\u0026rdquo; kind of state - movements being quite sluggish and all I want to do is to just lie down and do nothing for the whole day. However, these are just my feelings on how I feel my body is at that point of time; who knows, it could just be a placebo effect from watching all those youtube videos on long term fasting.\nAfterthoughts # This whole weight loss has been a huge learning journey; the part that actually surprised me the most was the one about the food pyramid. I still remembered when I was young, I kind of thought that the following the food pyramid is kind of the way to get a healthy body. That, as well as plenty of exercise. Who would have known that there was a huge dark secret behind all those food recommendations.\nNowadays, food recommendations are quite complex and they change every other year. A while back, they were saying all fats are bad but now certains fats are good but plenty of them are bad. So rather than following all that \u0026ldquo;recommendations\u0026rdquo; which changes every once in a while, why not stick to something that works well for me; which in this case is the whole fasting routine and all.\n","date":"5 April 2022","externalUrl":null,"permalink":"/a-weight-loss-journey/","section":"Posts","summary":"Before reading on, I’d just give a disclaimer here that any experience recounted here does not count as medical advice. As with anything to do with health or your body, do seek the appropiate medical channels (your doctor etc). Whatever things that is mentioned here might work for my case but it could be completely diferent for each person’s case due to different past diets/medical history.\n","title":"A Weight Loss Journey","type":"posts"},{"content":"NOTE: This post is only my personal view during my course of work across application development and devoloper operations roles across multiple roles and multiple companies and side projects. This might probably sound like random rambling to a software developer that is working in the industry but sometimes, it gets pretty irritating where people throw reasons that certain decisions should be made for \u0026ldquo;performance\u0026rdquo; and provide vague reasons for it.\nWhen one creates a application, there are various concerns that they need to focus on in order to safely get it into production. Definitely the main concern would be the development of the business logic - this is needed to be developed in order to help the company make money/save money. This should be priority one; all other concerns are some sort of secondary to that initial goal. Some of the other concerns that we need to also take into account would be:\nApplication security Operationability (ease of operating application in hosting environment) Performance. All of the factors listed above are definitely important but I do feel that \u0026ldquo;performance\u0026rdquo; is definitely not as important as some of the others such as security and operationability. I\u0026rsquo;ll expand on this further down in the post.\nSecurity # Security is definitely one of the more important factors, after development of business logic. (Some would even argue that one should focus on security more than business logic concerns). Security is especially important nowadays, since applications are usually exposed to the world wide web and any hack can potentially result in disastrous loss of data and trust in the applications that are being run by the company. That would inadvertably affect the bottom line - thereby, making this priority 1 concern to tackle. Many of the common security issues can be avoided by following the usual best practises when it comes to deployment and application development. One example would be sql injection. This issue is potentially dangerous due to the information that can be potentially leaked from the application when the right query is put into text box on the frontend. Or another scenario that would be worst would be the user inserting a sql command that drops tables. I don\u0026rsquo;t think there is much to argue about the importance of security compared to performance concerns; I will be hard-pressed to find anyone who can easily just ok changes in application to make it perform better (but resulting in additional security risk to users or making the application less useful for users).\nOperationability # Operationability is the next point I intend to cover and I generally believe that this should place higher priority as compared performance as well. What I mean by operationability is the ease/difficulty to run and manage the application in production settings. Some of the operation aspects that a person would need to care about for an application would be how easy to upgrade/rollback application as and when it\u0026rsquo;s needed, procedures for scaling the various of the aspects - especially when it comes to data storage (e.g. reliance on various storage mechanisms of data such as databases etc).\nSo why operability is important? One primary reason is that running an application takes effort and time and resources. If you have an application that is \u0026ldquo;highly performant\u0026rdquo; but comes at the expense of requiring a large amount of support hours - you best make sure that the \u0026ldquo;highly performant\u0026rdquo; application is an extremely valuable application for the company. If such an application requires a lot of hand holding as well as eyeballs to ensure that it operates well - then, it would be very very expensive to run. Headcount needs to spent on it and hence, that\u0026rsquo;s why the previous statement was mentioned; the application has to be very \u0026ldquo;valuable\u0026rdquo; in order to allow such headcount usage. And this is on the big assumption that we can properly attribute to which application that a company produces which is the company\u0026rsquo;s main money maker.\nMoney/expenditure to support applications is not the only reason. If an application requires a large amount of support in order to support its running in production, that would most likely mean that the support engineers/technicians (SRE?) coming in to support the application would be required a large number of manual steps to keep the application alive. I\u0026rsquo;ve never seen a smooth way where having a human run a number of manual steps to properly run/debug an application will go error-free. Especially if you require those same manual steps be repeated across multiple regions/zones for the application. There is an extremely likelihood an error is gonna appear and when it does, the application developer is definitely gonna be dragged in to try to fix said errors/issues.\nSome of the steps taken to make sure that an applications\u0026rsquo;s is more \u0026ldquo;operable\u0026rdquo; would be to ensure that application is simple to run and it is inline with common processes that other applications that the company has running on production. E.g. Let\u0026rsquo;s say a company has like 20+ web applications connected to MySQL databases running in production; adding another web application that connects to a MySQL database would be way more trivial as compared to maybe running a web application that connects to another database that the company is not used to operate (e.g. Cassandra). New processes/automations need to be built.\nOther ways is to invest in the time and effort to write up automation scripts using CI/CD tools and technologies such as Ansible/Terraform/Jenkins etc. The investment in such tools pays off almost immediately - and it would definitely save amount of time and effort to maintain such applications in production.\nPerformance # And we now are in the main part of this blog post, which is to kind of explain why performance should not be prioritized as importantly as other factors. I\u0026rsquo;m definitely not saying the application performance is not important - more like its \u0026ldquo;less\u0026rdquo; important compared to factors such as operability and security of the application. It is more \u0026ldquo;important-er\u0026rdquo; to make sure that the applications that we built are \u0026ldquo;secure\u0026rdquo; and \u0026ldquo;easy to operate\u0026rdquo;.\nThe thing about performance, is that there is no \u0026ldquo;end goal\u0026rdquo; for performance. There is usually always something to do to make an application more \u0026ldquo;performant\u0026rdquo; but the thing that we need to identify and understand is that performance matters in order to allows companies be more efficient with the resources that they have. It\u0026rsquo;s all about the money. The application should be optimized such that the cost of running is as low as possible but doesn\u0026rsquo;t result in constant issues of requiring support to ensure that the application is kept running.\nThis kind of makes one wonder - how does one decide if the application is \u0026ldquo;performant\u0026rdquo; enough? And that is where is where monitoring as well as SRE principles come in. Let\u0026rsquo;s take an example, if we have a API endpoint and it is especially important that the latency of the endpoint is low. But how low should it go? Maybe a latency of 1s for 99% of requests is good enough? Or 1s for 99.9% of requests? Once this is defined, we can alter our application architecture to just meet this goal as simply as possible. The easier the codebase that meets that \u0026ldquo;performance\u0026rdquo; goal, the easier it is for application developers to support the application; the easier it is for the SRE team to support the application in production etc. Essentially, the tldr to that: \u0026ldquo;Do just enough engineering to meet our business requirements\u0026rdquo; and never do more.\nI\u0026rsquo;ve seen my fair share of stories of people deciding certain actions in order to make an application more performant; e.g. putting application near databases, deciding never to use queue systems and instead, having the application handle queues, deciding not to rely on redis but instead, have the cache for api embedded into the application. There are definitely reasons for such technical decisions but those decisions should now be learning points of the \u0026ldquo;pain\u0026rdquo; of supporting such applications. Those decisions aren\u0026rsquo;t \u0026ldquo;wrong\u0026rdquo; per say; it just comes with trade offs. An example could be an application deciding to never rely on an external queue system such as kafka/redis/nats but instead, implement the queue system. One of the drawbacks is that now, application developers now need to support a functionality that is now generically available in the market; its a tech burden that the team has to carry - the self-implemented queue system better be worth it. Another reason could be that if the queue system is within the application - resources is needed to spent on that same application, teams need to ensure that availability of the app is higher than what they can afford. This is a consequence of having such a feature.\nA common quote here is \u0026ldquo;premature optimization is the root of all evil\u0026rdquo;. Link It is better if we first build up the application with the \u0026ldquo;minimal\u0026rdquo; effort in order to run our application with business logic before we begin to start the whole optimization process. Maybe for the initial version of application, we can try to run the application without a cache but if we find that the performance of the apis that we\u0026rsquo;re providing is too horrible, maybe we can then consider the cache idea. Or we can also relook at some of the sql queries that are being run; maybe the query is selecting too many records in its first pass etc. It is better to do this rather than worry for months if a cache is needed than have constant arguments of whether to keep the cache within the application or maybe, offload the cache to a central redis server or a dedicated redis server for the application.\nAs an afterthought, I guess this kind of explains a bit of the whole of the tech sector moving towards containers. Applications within containers definitely take a performance hit (we\u0026rsquo;re traversing through another layer of abstraction), however, it makes it easier to understand what\u0026rsquo;s running in production if you\u0026rsquo;re able to isolate and encapsulate the runtimes of applications that to be shipped to production. The encapsulated container can be tested in various testing environments which makes it easier to understand what version of the application is in production and to ensure that the application dependencies are brought along with it. But of course, this is probably just one aspect to such a decision; there is definitely a variety of reasons that would result in a company to proceed and decide to utilize such technologies.\n","date":"20 March 2022","externalUrl":null,"permalink":"/application-performance-isnt-the-most-important-factor-in-application-development/","section":"Posts","summary":"NOTE: This post is only my personal view during my course of work across application development and devoloper operations roles across multiple roles and multiple companies and side projects. This might probably sound like random rambling to a software developer that is working in the industry but sometimes, it gets pretty irritating where people throw reasons that certain decisions should be made for “performance” and provide vague reasons for it.\n","title":"Application Performance isn't the most important factor in application development","type":"posts"},{"content":"This is a list of notes for possible interview questions with regards to devops roles. Interview questions for devops are particularly hard to cover since devops roles generally cover a broad range of topics and technologies. I will update this page as I see any interesting or \u0026ldquo;hard\u0026rdquo; questions to cover.\nWeirdly enough, a lot of the questions gather are usually \u0026ldquo;fringe\u0026rdquo; edge cases that one may accidentally come across due to unique use cases.\nI will update this post as time goes by - if there is more information on this\nGeneric How is a computer assigned an IP Address in a LAN? What happens when a user accesses a website from a website browser? What are the different kind of DNS Records? What are some differences between Redis and Memcached? What is the purpose of a Certificate Authority? What\u0026rsquo;s the difference between threads and processes? How do we monitor Java applications? What is Swap space used for? What are Huge pages in linux used for? What is the difference between TCP/UDP? What is ICMP? What are the fallacies of distributed computing? Any useful guidelines when deciding on what metrics that application should have? What are some useful linux commands? What\u0026rsquo;s the meaning of some of the following terms when handling systems: What are inodes, hard links and symlinks, file descriptors (FD) in linux filesystem? How does one improve security posture of deployments? IPTables Commands How does MITM (Man in the Middle) attack work? How does TLS prevent MITM attack? What are Anycast IPs? How does Anycast IPs work? What is Apparmour and what does it do? What is Seccomp and what does it do? What are some of the steps one could take to harden a linux instance? System Design References Design a commenting system Design a CDN Design a code-deployment system Design a API rate limiter system Docker What\u0026rsquo;s the difference between COPY and ADD? What\u0026rsquo;s the difference between CMD and ENTRYPOINT Why use Execution form over Shell form in Dockerfile How is isolation achieved in Docker? How does volume mounting work in Docker? Assume you have an application that requires MySQL database. Assume that the app and database is deployed in 2 separated containers. Why can\u0026rsquo;t the application use \u0026ldquo;localhost:3306\u0026rdquo; to connect to the database? Kubernetes What is the architecture of Kubernetes? We usually disable swap space when running Kubernetes 1.21 and earlier. Why? What are some of the ways to expose application endpoints within k8s externally? What\u0026rsquo;s the difference between statefulsets and deployments? And how does statefulsets allow databases to be deployed safely into Kubernetes? How does a external network request reach into a pod via Ingress? How is volume mounting handled in Kubernetes? What is a headless service? When creating operator - how are reconcilition loops started? How does one achieve multi-tenancy in Kubernetes environment? Why you can\u0026rsquo;t ping a service? Debugging steps for Kubernetes Applications What are some of the security steps to harden Kubernetes deployments? Databases How does one go about to \u0026ldquo;reshard\u0026rdquo; a database Useful links Generic # How is a computer assigned an IP Address in a LAN? # Either statically assigned an IP Address Generally, most computers on home networks/office network work with DHCP Client computer broadcast a DHCP Broadcast Message (since no IP Address) DHCP Server (Usually router) - responds with an IP Address to offer + Default gateway + Subnet Masks etc Client computer responds with a request to \u0026ldquo;claim\u0026rdquo; the IP Address DHCP responds with acknowledge that IP Address has been claimed, leases for a few hours (based on configuration of router). Acknowledge message may contain some additional \u0026ldquo;options\u0026rdquo; information such as DNS server etc. Technically, DHCP servers can be configured to issue such information accordingly. References: https://support.huawei.com/enterprise/en/doc/EDOC1000178170/225eec10/dhcp-messages\nWhat happens when a user accesses a website from a website browser? # DNS Resolving Check again local\u0026rsquo;s /etc/hosts file to determine first level of dns resolve Reach out the dns server on current local network if setup (e.g. running your own DNS server etc or usually hit the router) - all this information set during DHCP If local network\u0026rsquo;s DNS not available, the router hop would reach out to further out to provider/etc or other root authorative name servers. Possibly the network provider (e.g. In singapore, could be Starhub/Singtel\u0026rsquo;s DNS servers) All above would be skip if dns server to be lookup-ed on workstation is set in network configurations (e.g. 8.8.8.8, 8.8.4.4, 1.1.1.1) Connect to remote IP Address Process start to connect to remote IP Address Utilize some of the information to decide how to do the first hop Utilize the subnetwork mask and check with IP Address - If \u0026ldquo;network\u0026rdquo; part is different for remote server - that would mean that request has to be sent to the Default Gateway (for local workstations - that would be the wifi router) TCP Handshake It comes before TLS/SSL step Client sends SYN (Synchronize Sequence Number) Server sends SYN/ACK (Synchronize + Acknowledged) Client sends ACK (To say that it has received the message) SSL Handshake or TLS Handshake If website to be accessed is accessed via https Refer to the following website for more details Client Hello (Includes TLS version that client browser support + random string client) Server Hello (Sends SSL cert with public key + cipher version chosen + random string from server) Authentication (Client checks if SSL cert valid - e.g. not expired, valid chain of certs, trusted certs) Premaster secret (Client generates a premaster secrets and encrypts with server public key and sends it over to server - can only be decrypted with server\u0026rsquo;s private key) Private key used (Server decrypts premaster secret) Session key created (Both client and server generate session key using random client string, random server string and premaster key) Client ready Server ready Handshake complete Fetch HTML from website (could come from server/CDN/Cached Responses in Load Balancer) While rendering HTML, fetch javascripts, images etc Javascript could be used to fetch results from APIs etc. To prevent security issues, CORS rules are set in place in browser, difficult to call APIs across domains References:\nhttps://www.youtube.com/watch?v=VONSx_ftkz8 What are the different kind of DNS Records? # A Record - Mapping domain to IPv4 addresses AAAA Record - Mapping domain to IPv6 addresses CNAME Record - Mapping domain to another domain as an alias NS Record - Provide information on the authorivative DNS server for that domain MX Record - Provide information for where the emails meant for that domain is supposed to go TXT Record - Adding text information to the domain records (e.g. adding text to prove that you own the domain etc) What are some differences between Redis and Memcached? # Both are caching tools that store items in memory; however, due to different implementations, they come with their own set of restrictions or drawbacks.\nMemcached is very simplistic; Redis is very feature reach, can store complex data models Memcached doesn\u0026rsquo;t even have cluster mode; Redis allows cluster mode to handle higher throughput. (Means for memcached - \u0026ldquo;cluster\u0026rdquo; mode would need to rely on clients - clients would need to implement all that logic) Memcached is multi-threaded while redis is \u0026ldquo;single threaded\u0026rdquo;. Means, if any operation is blocking, no requests can be served till it\u0026rsquo;s done. References:\nhttps://medium.com/@jychen7/sharing-redis-single-thread-vs-multi-threads-5870bd44d153 https://medium.com/@SkyscannerEng/scaling-memcached-cdef01e150a1 https://github.com/memcached/memcached/wiki/Commands https://redis.io/commands What is the purpose of a Certificate Authority? # A certificate authority is usually an organization/private entity that would usually do validation of other websites by issuing digital certificates. A user who utilizes such third party certificate issuers would first need to create a private key and then a certificate signing request. The certificate signing request would be passed to the CA which would then be used to create the cert that the user can then use.\nWhat\u0026rsquo;s the difference between threads and processes? # Process is any program in execution vs threads being a segment of process Process are \u0026ldquo;heavy\u0026rdquo; and takes a while to start while threads are setup way faster Process have memory mapped different between processes but threads in a process share the same memory space (E.g. A golang application will run in a process which would setup 1 or more threads which would run goroutines that would manage threads by the Golang runtime) How do we monitor Java applications? # Java applications are generally wrapped in its own runtime; in a Java Virtual Machine. A normal monitoring solution that attempts to monitor the server/container that runs the Java application will not reflect the true reality of how much memory that the Java application is actually using. Java application (at least JDK 8) - usually reserves a block on memory on startup Certain monitoring solutions such as prometheus would require such sort of exporter to export JVM metrics to show the true state of Java application What is Swap space used for? # In servers, there is a limit to how much memory that is available for the server to use (which includes running of important kernel level functionality). However, there are cases where the amount of memory on the memory is not sufficient. Swap space is essentially \u0026ldquo;disk space\u0026rdquo; where memory chunks are stored temporarily. Access to it is was slower (Memory access speeds \u0026raquo;\u0026gt; physical storage) - these might induce latency hits on application etc\nSwap starts to get used more and more as space get used more and more, making the system more and more slower. You can see the impact of this on CPU - kernal need to spend a few cycles to move data around from storage back to memory to compute before dumping the results back into disk.\nWhat are Huge pages in linux used for? # Data is moved from slower storage to memory - this whole operation is all managed in blocks call pages. A typically page is 4Ki - essentially, memory is moved around and loaded up in 4Ki blocks at one time. I\u0026rsquo;d imagine that more data intensive applications would need to rely on this mechanism; sometimes, if the loading of data from storage is too slow, might be better to switch to using faster storages or loading larger chunks of data (at the cost of moving more data into memory)\nOne cost to take note of when handling is that memory chunk that is loaded in is quite big - kernel need to make space for it. As large chunks get allocated/deallocated, the memory will get more and more fragmented - kernel need to compact it to give it more space. (Expect CPU to go up)\nE.g. Hadoop performance degradation with THP (but partly from bug) - https://www.ghostar.org/2015/02/transparent-huge-pages-on-hadoop-makes-me-sad/\nI\u0026rsquo;d assume huge pages are less useful nowadays with SSD (this would be useful in the past). SSD is way faster than HDD so that should be the first optimization rather than looking at huge pages as the first optimization\nWhat is the difference between TCP/UDP? # TCP = Transmission Control Protocol\nUDP = User Datagram Protocol\nTCP is connection based while UDP isn\u0026rsquo;t. No need for UDP to initiate connection etc, can immediately send data over the wire TCP is able to sequence while UDP is not TCP is able to guarantee that transmission of data is done successfully but UDP is unable to TCP is able to check for correctness that data is sent successfully, UDP doesn\u0026rsquo;t need to UDP faster than TCP (from lack of overhead) Usage: TCP is used for HTTP/HTTPS, SMTP, FTP etc UDP is used for video streaming, VoIP, DNS? What is ICMP? # ICMP - Internet Control Message Protocol. Generally used for diagonostic purposes during IP operations.\nUsed via Ping commands or traceroute commands. From this, it would probably contain diagnostic information on what happen between source client and server - maybe one of the routing servers drop? Or packet got blcoked? Or port got blocked?\nWhat are the fallacies of distributed computing? # The network is reliable Latency is zero Bandwidth is infinite The network is secure Topology doesn\u0026rsquo;t change There is one administrator Transport cost is zero The network is homogeneous Any useful guidelines when deciding on what metrics that application should have? # https://medium.com/thron-tech/how-we-implemented-red-and-use-metrics-for-monitoring-9a7db29382af\nRED - Rate, Error %, Duration of request USE - Usage, Saturation %, Error %\nUSE might be used for systems/metrics that have a \u0026ldquo;maximum\u0026rdquo; - e.g. storage etc RED might be used for something that comes at a rate and theoretically have \u0026ldquo;no limits\u0026rdquo; - e.g. requests made to an application\nWhat are some useful linux commands? # Not in terms of importance:\n# Important tooling to install (if missing) sudo apt update sudo apt install -y iputils-ping vim sysstat net-tools # Management of components sudo systemctl status \u0026lt;component\u0026gt; sudo systemctl list-unit-files | grep enabled # Viewing logs etc of components sudo journalctl -u \u0026lt;component\u0026gt; -f --since \u0026#34;10 minutes ago\u0026#34; --no-pager # Viewing which folder is taking the most logs sudo du -sh $(ls) df -h # Finding a file # https://www.tecmint.com/35-practical-examples-of-linux-find-command/ find / -name hosts # Viewing performance at the moment (For quick debugging) # apt install -y procps top # M (by Memory), N (by PID), P (by CPU - proceessor), T (by time) # E (change units) # t (cpu graph), m (memory graph), k (kill signal), c (show full command line), r (renice) # # First line in top: e.g. # 13:27:26 up 2 days, 18 min, 1 user, load average: 0.00, 0.07, 0.09 # \u0026lt;uptime info\u0026gt; \u0026lt;load average by 1min, 5min, 15min - more than 1 is overworked\u0026gt; # # Third line in top: e.g. # %Cpu(s): 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st # us - % user processes # sy - % system processes # id - % idle time # wa - % wait time for IO # st - % wait for virtual CPU to access physical cpu # Cleaning up logs journalctl --vacuum-time=10d journalctl --vacuum-size=500m # Handling permission issues sudo chmod +x \u0026lt;binary file\u0026gt; sudo chown \u0026lt;user\u0026gt;:\u0026lt;group\u0026gt; \u0026lt;binary file\u0026gt; # Viewing file vim # Use j, k commands to jump up and down head tail less grep -c \u0026#34;INFO\u0026#34; logname.log grep -ic \u0026#34;INFO\u0026#34; logname.log # inverse of above # Disk related commands sudo fdisk /dev/sdb -\u0026gt; Create a new partition sudo fdisk -l sudo mkfs -t ext4 /dev/sdb1 lsblk -f sudo mount /dev/sdb1 /var/lib/mysql # Make sure to edit /etc/fstab # Network commands? ifconfig ip a # View logs ping nslookup \u0026lt;hostname\u0026gt; dig \u0026lt;hostname\u0026gt; # apt install -y dnsutils tcpdump #only if traffic is http or non-encrypted nmap -O localhost # Find out which port is open nmap -sU -O localhost # UDP Traffic port nmap -sT -O localhost # TCP Traffic port nmap -A \u0026lt;remote ip address\u0026gt; # Get details of remote server netstat -tunlp # Check which ports are open traceroute # There was a time where traffic for laptops for Starhub (while working) was dropping traffic? arp # Get network equipment details hostname # Get info of how the machine is presented to network # Linux firewalls iptables -L # Open files lsof -i -P -n # Find which process connected to which port # Compare files diff \u0026lt;filename1\u0026gt; \u0026lt;filename2\u0026gt; # Check cpuinformation cat /proc/cpuinfo # Performance Troubleshooting Demos # https://www.youtube.com/watch?v=rwVLa9me7e4\u0026amp;ab_channel=grobelDev # https://netflixtechblog.com/linux-performance-analysis-in-60-000-milliseconds-accc10403c55?gi=3d8d7960fce4 uptime dmesg -T | tail vmstat 1 # Virtual Memory stats - if r \u0026gt; no of processes. bad mpstat -P ALL 1 # Report processor stats - if one is busy but the rest is not - could be single threaded app pidstat 1 # PID stats - check on pid level - which processes taking resources? iostat -xz 1 # IO stats - check if io devices are the bottleneck? free -m sar -n DEV 1 # (System Activity Report) Networking check sar -n TCP,ETCP 1 # Networking check top # General overview of system - hard to capture short-lived processes atop # Top but with historical info strace -tp `pgrep lab003` 2\u0026gt;\u0026amp;1 | head -100 # Check system calls (for detailed reasons for why app is so busy on system) perf record -F 99 -a -g -- sleep 10 # Capture perf info perf report -n --stdio What\u0026rsquo;s the meaning of some of the following terms when handling systems: # SLI - service level indicator SLO - service level objective SLA - service level agreement MTBF - Mean time between failures MTTR - Mean time to recovery or repair or respond (they all mean different things) RTO - Recovery Time Objective. Max amount of time since downtime allowed for services to recover to working order RPO - Recovery Point Objective. Max amount of time for which data is lost that the organization is willing or ok to lose Incident Handling Post Mortem Root Cause Analysis What are inodes, hard links and symlinks, file descriptors (FD) in linux filesystem? # Inodes are metadata for a file Hard links are the physical reference to a file - limited to one per file Symlinks - symbolic links are soft links to a file (essentially, its like a reference to a actual hard link) File Descriptions - are files that represent \u0026ldquo;open\u0026rdquo; files or \u0026ldquo;open\u0026rdquo; sockets (relates back to how linux was designed where everything would be ideally represented as a file) How does one improve security posture of deployments? # Note: Attestation means evidence or proof of something\nEnsure that dependencies that applications rely on is scanned to ensure that it doesn\u0026rsquo;t contain malware Ensure that no secrets are in Git Utilizes services that allow secret rotation more easily (e.g Secrets Manager, vault etc) If applications are in containers Ensure no secrets within it Reduce attack surface by reducing amount of dependencies (use smaller images - e.g. slim or alpine or distroless images) Scan container images to check for vulnerabilities Maybe consider micro-vms (since if containers are broken out, can affect host\u0026rsquo;s kernel) Ensure container is non-root If applications are in Virtual Machines Try to ensure that VMs are not exposed to internet unnecessarily. E.g. Google Compute Engine by default has public interface; if it doesn\u0026rsquo;t need it, it shouldn\u0026rsquo;t have it Maybe consider the SPIFFE project - ensure that nodes can only talk to nodes that are within the \u0026ldquo;trusted\u0026rdquo; circle and have gone through the appropiate \u0026ldquo;attestation\u0026rdquo; to ensure that people know who the node is etc If applications are in Kubernetes Use network policies to restrict applications that can communicate with each other Possibility to utilize service mesh to somehow get users to communicate with each other using mTLS Possibility to utilize Binary authorization with attested images (maybe to prove security scan is done etc) Use separate service account and RBAC for applications (more granular permission control) Set container such that it needs to be non-root to run it Set container such that it doesn\u0026rsquo;t need linux permsisions (unless required) For all deployments Ensure that logs emitted from all applications do not print out security tokens/credentials or user information - need to have constant scanning of information Ensure resource policies are set (to ensure no runaway application) Utilize linux tooling Apparmor: Mandatory Access Control framework that functions as an LSM (Linux Security Module). It is used to whitelist or blacklist a subject\u0026rsquo;s (program\u0026rsquo;s) access to an object (file, path, etc.). Seccomp: a Linux feature that allows a userspace program to set up syscall filters. Dropping capabilities IPTables Commands # Here is a list of iptables commands (with explanations of what it\u0026rsquo;s doing)\nView the following youtube series: https://www.youtube.com/watch?v=xHwtG9S8Fwo\u0026list=PLvadQtO-ihXt5k8XME2iv0cKpKhcYqe7i\u0026index=2\u0026ab_channel=SysEngQuick\napt install iptables-persistent iptables-save \u0026gt; /etc/iptables/rules.v4 # Accept all input and output to be accepted iptables -A INPUT -j ACCEPT iptables -A OUTPUT -j ACCEPT # Set default policies for input, forward and output chains iptables -P INPUT DROP iptables -P FORWARD DROP iptables -P OUTPUT DROP # Accept localhost interface (from ifconfig - view network interface) iptables -A INPUT -j ACCEPT -i lo iptables -A OUTPUT -j ACCEPT -o lo # Easier to view and debug iptables rules - default is filter table iptables -L -n -v --line-numbers # View the NAT tables iptables -L -n -v --line-numbers -t nat # Allow currently established connections to continue iptables -A INPUT -j ACCEPT -m conntrack --ctstate ESTABLISHED,RELATED iptables -A INPUT -j OUTPUT -m conntrack --ctstate ESTABLISHED,RELATED # Adding comments iptables -A INPUT -j ACCEPT -p tcp --dport -m comment --comment \u0026#34;Test comment\u0026#34; # Deleting rules iptables -D INPUT 1 iptables -D OUTPUT 1 # Adding specific examples for adding specific protocoles iptables -A INPUT -j ACCEPT -p icmp --icmp-type=8 iptables -A INPUT -j ACCEPT -p tcp --dport 22 iptables -A OUTPUT -j ACCEPT -p icmp --icmp-type=8 iptables -A OUTPUT -j ACCEPT -p tcp --dport 22 iptables -A OUTPUT -j ACCEPT -p tcp --dport 80 iptables -A OUTPUT -j ACCEPT -p tcp --dport 443 iptables -A OUTPUT -j ACCEPT -p tcp --dport 53 iptables -A OUTPUT -j ACCEPT -p udp --dport 53 iptables -A OUTPUT -j ACCEPT -p udp --dport 123 # NTP traffic iptables -A FORWARD -m comment --comment \u0026#34;established traffic\u0026#34; -j ACCEPT -m conntrack --ctstate ESTABLISHED,RELATED iptables -A FORWARD -j ACCEPT -i lan -o wan # To allow forward echo 1 \u0026gt; /proc/sys/net/ipv4/ip_forward iptables -t nat -A POSTROUTING -o wan -j MASQUERADE iptables -t nat -A PREROUTING -p tcp --dport 22 -d 192.168.5.201 -j DNAT --to-destination 172.16.1.102 iptables -A FORWARD -p tcp --dport 22 -d 172.16.1.102 -j ACCEPT How does MITM (Man in the Middle) attack work? # TODO: Add content\nHow does TLS prevent MITM attack? # TODO: Add content\nWhat are Anycast IPs? # TODO: Add content\nHow does Anycast IPs work? # TODO: Add content\nWhat is Apparmour and what does it do? # TODO: Add content\nWhat is Seccomp and what does it do? # TODO: Add content\nWhat are some of the steps one could take to harden a linux instance? # TODO: Add content\nSystem Design # References # https://mecha-mind.medium.com/\nDesign a commenting system # TODO: Add content\nDesign a CDN # TODO: Add content\nDesign a code-deployment system # Which aspect of the code deployment pipeline? Need to involve CI portion? E.g. testing to ensure that application is fine before deployment? testing to ensure that application is fine after deployment What kind of code are we shipping? Packaged into binary packages etc? Packaged into VMs (e.g. Using packer) Packaged into Container images? What sort of environment will the code be deployed into? Public Cloud? Privately owned datacentres? How to get artifacts into target environments? Allowed to rely on public cloud? Need to be able to sync large objects across datacentres How to ensure releases a synced across datacentres? Config file that are synced to ensure that all are of the same version Not all datacentres are of same size (Different DCs have different set of performance) - need to adjust datacentre configuration accordingly Monitoring of all process/pipelines Syncing of artifacts between the various datacentres? Syncing of data centre configurations (Either manually start the process to sync up? Or automatically have the dc have some sort of binary/controller to update the process accordingly) Design a API rate limiter system # https://medium.com/geekculture/system-design-basics-rate-limiter-351c09a57d14\nAlgorithms Leaky bucket (Need a big cache to store items to be outflowed?) Token bucket (Have a token generator -\u0026gt; service to serve traffic will fetch token before serving it into system) Fixed Window Counter (Have a counter for each duration of time, reset once next counter begins) Sliding Log (Cache all requests with timestamp, and serve sufficient traffic to hit request rate) Sliding Window (Combines concept of fixed window counter and sliding log) Handling API rate limiting in a large distributed system Centralized datastore to store api request count? Introduces potential amount of latency? Docker # What\u0026rsquo;s the difference between COPY and ADD? # ADD was probably introduced earlier - ADD can add files from local filesystem into container. It can also pull from remote sources into container. It can also auto extract files from tar files into docker image COPY can only do local filesystem into container COPY is the more \u0026ldquo;secure\u0026rdquo; solution here of sorts What\u0026rsquo;s the difference between CMD and ENTRYPOINT # CMD -\u0026gt; Set default parameters that can be over-rided from CLI ENTRYPOINT -\u0026gt; Set default parameters that cannot be over-rided from CLI CMD used when building applications but ENTRYPOINT could be used for \u0026ldquo;utility\u0026rdquo; containers (e.g. yq container - only need to pass in flags) Reference: https://www.bmc.com/blogs/docker-cmd-vs-entrypoint/\nWhy use Execution form over Shell form in Dockerfile # Shell form in dockerfile -\u0026gt; e.g. CMD ./app Executable form in dockerfile -\u0026gt; e.g. CMD [\u0026quot;./app\u0026quot;] Shell command form always passes within a shell and goes through various shell validation before returning results - its like a shell that warps the string provided to it Executable form skips shell validation and processing - immediately invokes commands Issues when running app with shell command form - the \u0026ldquo;sh\u0026rdquo; command is invoke, cancelling it doesn\u0026rsquo;t exactly cancel the app, it kills the app but not the shell -\u0026gt; causing issues (hang on terminal) How is isolation achieved in Docker? # Refer to the following video: https://www.youtube.com/watch?v=8fi7uSYlOdc Refer to the following code for the video here: https://github.com/lizrice/containers-from-scratch A container is essentially: Linux namespaces These act as a filter on what you can see from within the container E.g. For ps command - you can only see pids within that container E.g. For networks interfaces ifconfig command - you can only see network interfaces relevant to that container within it Cgroups Mechanism that allows one to limit resources to a process E.g. cpu/memory etc Building that container runtime might involve: Setting the hostname Changing root fs to something to another folder (different from host) Ensure that directory on top level is / Mount \u0026ldquo;proc\u0026rdquo; into container so that ps works How does volume mounting work in Docker? # Docker CLI will communicate with Docker local \u0026ldquo;server\u0026rdquo; daemon, reference: https://github.com/docker/cli/blob/cf8c4bab6477ef62122bda875f80d8472005010d/vendor/github.com/docker/docker/client/container_create.go#L54\nFrom docker-cli repo (a \u0026ldquo;post\u0026rdquo; request call to create container) -\u0026gt; moby/moby repo daemon pkg -\u0026gt; createContainer call which then calls specific os specific settings. daemon pkg -\u0026gt; createContainerOSSpecificSettings volume/service pkg -\u0026gt; Ask volume service to create -\u0026gt; Ask volume store to create -\u0026gt; Ask volume driver to create -\u0026gt; (Default Mac docker volume plugin uses local -\u0026gt; Creates directory and sets permission) container pkg -\u0026gt; AddMountPointWithVolume (just object representation) daemon pkg -\u0026gt; populateVolumes Calls Volume mounts from moby/sys repo Final unix mount command: https://github.com/moby/sys/blob/main/mount/mounter_linux.go#L30 Assume you have an application that requires MySQL database. Assume that the app and database is deployed in 2 separated containers. Why can\u0026rsquo;t the application use \u0026ldquo;localhost:3306\u0026rdquo; to connect to the database? # Firstly, need to understand the following aspects: On mac/windows, all docker containers are run in a mini linux vms that is provided via docker desktop When an app is exposed from docker container to host using -p flag; it is traversing from the app -\u0026gt; mini linux VM (Docker vm) -\u0026gt; exposed vm\u0026rsquo;s port -\u0026gt; Host machine -\u0026gt; expose it on host machine Docker\u0026rsquo;s network is designed that each container is referred to its own ip address Applications in a single container\u0026rsquo;s localhost won\u0026rsquo;t have MySQL installed in it The application in that container need to reach to the other container to access MySQL In default docker network bridge - need to use IP address (Apparently service discovery is not done properly in the past and is now probably kept for backward compatability) https://docs.docker.com/network/network-tutorial-standalone/ https://stackoverflow.com/questions/41400603/dockers-embedded-dns-on-the-default-bridged-network Create a separate new bridge network and one can connect via names (don\u0026rsquo;t forget to use --name flag when running docker container) Or alternatively, use docker-compose Kubernetes # What is the architecture of Kubernetes? # Consists of the following components:\nControl plane components\netcd (store state of k8s) api-server (expose k8s api) kube-controller-manager (has multiple controller for various k8s assets e.g. jobs, endpoints etc) kube-scheduler (handles scheduling of pods taking into account of taints, annotations, constraints, affinities) cloud-controller-manager (manager that would communicate with the hosting provider) cAdvisor (component that actual pull metrics about container cpu/metrics from cgroup linux fs) -\u0026gt; inbuilt into kubelet heapster/metrics server (to be used to serve metrics about k8s components, taken up by kube-apiserver etc - to handle horizontal pod autoscaling etc) kubeDNS/coreDNS - handles the DNS of the cluster. For CoreDNS, it startups by connecting to kubeapi and then watching endpoint objects and map it accordingly Node components\nkubelet (agent that make sure pod is running on node) Kube-proxy Refer to the video: https://www.youtube.com/watch?v=BxDnv7MpJ0I (No longer valid - userspace mode) Intercepts connections to clusterIP of pods (Does not actually do proxying of rules) (No longer valid - userspace mode) Does load balancing of traffic to k8s services Kube-proxy maintains iptables rules (if iptables mode is used) -\u0026gt; relies on linux capabilities Kube-proxy does this by watching endpoints -\u0026gt; once endpoint pops into existance, it adds it to be a place that can be proxied For requests that come in with DNS -\u0026gt; resolved with coredns Container runtime (Default is now containerd) - doesn\u0026rsquo;t matter as long as runtime supports OCI spec Container Networking Interface (CNI) - run daemon that sets up the overlay network for the cluster Main responsibility of setting up overlay network Manage IP address (IPAM) - IP Address Management plugin is included in it Container Storage Interface (Managing of storage mounts) Watching of PV and PVC objects and the controller will report to the daemon to handle the mounting/unmounting as well as cleanup of it Reference: https://kubernetes.io/docs/concepts/overview/components/\nWe usually disable swap space when running Kubernetes 1.21 and earlier. Why? # Swap space is essentially disk space which is a temporary place that memory \u0026ldquo;overflows\u0026rdquo; into. Disk is way way slower as compared to memory - enabling this make the performance of application extremely variable and unstable; we\u0026rsquo;ll not be super sure which container has its memory written onto disk etc.\nAs mentioned in a blog post on Kubernetes blogs, there is a check to ensure that swap space is disabled (kubelet will not start if this check fails - or if you just ignore the checks)\nReference:\nhttps://kubernetes.io/blog/2021/08/09/run-nodes-with-swap-alpha/\nWhat are some of the ways to expose application endpoints within k8s externally? # Ingress Depends on how the Kubernetes cluster is setup and its cloud environment In Google Kubernetes Engine, a load balancer is actually created and routes are created onto it. The routes that reflect back into the cluster; providing a single external IP address that routes based on the ingresses defined. Nodeports Specified within Kubernetes Service objects Maps the ports exposed from the container to port on host machine on reserved ports of 30000-32768 Load Balancer Specified within Kubernetes Service objects In a cloud based environment, there will be a controller monitoring the service objects being created that requests for load balancers. It will communicate with its own respective clouds to create a load balancer and attach the external load balancer to that service. What\u0026rsquo;s the difference between statefulsets and deployments? And how does statefulsets allow databases to be deployed safely into Kubernetes? # Statefulsets has ordinal number at the back of pod name Stable pod name/name reference (can call specific pod in the stateful set) Pods in statefulsets can be accessed via headless services (no IP address for that service, you can access a specific pod via that service) If there are volumes to be mounted to it (via Persistent Volumes + Persistent Volume Claim) - each pod will have its own volume (unlike deployment where the persistent volume/volume claim is shared across the pods in deployment). This is done via VolumeClaimTemplates instead of VOlumeClaim How does a external network request reach into a pod via Ingress? # Coming\nHow is volume mounting handled in Kubernetes? # Depends but nowadays, Container Storage Interface (CSI) is one of the ways that seems to becoming mainstream was to get volume mounting into Kubernetes. Previously, for some of the code - this code is \u0026ldquo;in-tree\u0026rdquo; but its slowly being moved out.\nIn the Kubernetes cluster, you can define multiple \u0026ldquo;storage classes\u0026rdquo; - which you can then put into Kubernetes PV definition on which class of storage you want for the application. E.g. SSD (which is definitely more expensive) vs HDD storage class.\nFor some storage types (e.g. Local) - require manual creation of disks that needs to be managed by an administrator. A lot of effort and very hard to scale out\nFor other storage types (e.g. GCE-PD) - supports dynamic mode and a controller can be made available that would be able to create the disk and mount it to the node accordingly. This definition is based of PVC - no need to create PV for this (controller probably creates for the user of the PVC)\nWhat is a headless service? # A kubernetes service that does not set a IP address for that Kubernetes service Done by setting clusterIP to None In nslookup \u0026lt;service name\u0026gt;, it will list all IP address behind that service name Example of how headless service is useful GRPC application that would utilize that absorbs all IP address where GRPC would load balance the applications across the pods Use also for Statefulful set applications. To hit one of the pod via headless service - \u0026lt;pod name\u0026gt;.\u0026lt;full service name\u0026gt; When creating operator - how are reconcilition loops started? # Within the xxxxx_controller.go file (based on kube-builder framework), it would usually contain some code to build up the controller manager object. The object build up with various properties (builder pattern) but the most important one would be For(...) - that would identify kind of object that controller is managed. The Complete method would invoke various controller functionality; the kubernetes \u0026ldquo;watch\u0026rdquo; functionality is invoked.\nReference: https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/builder/controller.go#L81 (May not be accurate)\nReference for watch documentation: https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes\nPossible youtube video on details of this: https://www.youtube.com/watch?v=PLSDvFjR9HY\nHow does one achieve multi-tenancy in Kubernetes environment? # Aim of multi-tenancy is to try to ensure that multiple application teams can share a single cluster and their usage of resources of the cluster does not have a wide blast zone\nUtilizing rbac on developer accounts attempting to access Kubernetes cluster to ensure that they have restricted access to certain resources Namespaces between teams Introduce resource quota and restrict it on per team basis Utilizing pod affinity rules to ensure that pods from different teams are not deployed to same nodes Utilizing taints and tolerations to probably try to book certain nodes for certain teams Why you can\u0026rsquo;t ping a service? # (NEED TO CONFIRM - BEHAVIOUR FOR THIS IS NOT CONSISTENT) https://nigelpoulton.com/why-you-cant-ping-a-kubernetes-service/\nDebugging steps for Kubernetes Applications # How do we start debugging an application that is deployed on Kubernetes\nEnsure that application works fine locally (can compile and can run without issues) Ensure that application works fine after its packaged in docker image Check Kubernetes manifest files/Helm chart to make sure that it works fine (make sure right ports are set) Check describe of pods if pods fail to start kubectl describe \u0026lt;pod name\u0026gt; Check health and readiness checks Describe of pods could say that secrets/configmaps missing Could be lack of resources in cluster Could be no nodes that allow pod to exist (tolerations) Check logs of the pods (could have multiple pods) kubectl logs -f \u0026lt;pod name\u0026gt; -c \u0026lt;container name\u0026gt; Could be database migration failure (Appication will fail to start?) Could be configuration error Try a \u0026ldquo;restart\u0026rdquo; first kubectl delete pods \u0026lt;pod name\u0026gt; OR kubectl rollout restart deployment \u0026lt;deployment name\u0026gt; If issue with other components connecting to it Check if can enter shell of image kubectl exec -it \u0026lt;pod name\u0026gt; -- /bin/bash Can check if application works from within application Run same check from other pods (Could be that app was compiled to listen only to \u0026ldquo;127.0.0.1\u0026rdquo;) If other component is using service, ensure that matchLabels is service match pod labels (NOT deployment labels) More elaborate debugging steps (In case shell not present) Copy pod while adding new container: kubectl debug \u0026lt;pod name\u0026gt; -it --image=ubuntu --share-processes --copy-to=debugging-pod Copy pod while changing its command: kubectl debug \u0026lt;pod name\u0026gt; -it --copy-to=debugging-pod --container=\u0026lt;pod name\u0026gt; -- sh Debug with shell on node: kubectl debug node/\u0026lt;node name\u0026gt; -it --image=ubuntu Additional cheatsheet for reference: https://kubernetes.io/docs/reference/kubectl/cheatsheet/ What are some of the security steps to harden Kubernetes deployments? # NSA Hardening Guide: https://www.nsa.gov/Press-Room/News-Highlights/Article/Article/2716980/nsa-cisa-release-kubernetes-hardening-guidance/ Ensure each container has read only root filesystem Prevent containers from accessing host files using high GIDs Don\u0026rsquo;t use k8s host path to mount volumes Prevent containers from escalating privileges (securityContext.allowPrivilegeEscalation: false) Prevent containers from running with root priviliges (securityContext.runAsRoot: false in k8s deployment) Prevent service acccount token (k8s service account) to be auto-mounted on pods Set requests and limits to prevent runaway containers Do not run build process (e.g. Docker build) in production Kubernetes clusters (they have arbitrary commands to run commands) Scan built docker images to ensure no security issues Ensure you do not use the --privileged docker flags in docker and kubernetes Ensure that secrets are encrypted at rest and transit (either encrypt etcd or use KMS or Hashicorp vault) Prevent container drift (someone went into the container to modify the running image) Implement network policies (set policy for which pod can talk to which pod) Databases # How does one go about to \u0026ldquo;reshard\u0026rdquo; a database # TODO: Add content\nUseful links # https://www.hairizuan.com/experimenting-with-ip-tables/ https://www.hairizuan.com/application-performance-isnt-the-most-important-factor-in-application-development/ https://www.hairizuan.com/basic-ssl-setup-server-and-client-ssl-certificate-setup/ ","date":"27 February 2022","externalUrl":null,"permalink":"/devops-interview-questions/","section":"Posts","summary":"This is a list of notes for possible interview questions with regards to devops roles. Interview questions for devops are particularly hard to cover since devops roles generally cover a broad range of topics and technologies. I will update this page as I see any interesting or “hard” questions to cover.\n","title":"Devops Interview Questions","type":"posts"},{"content":"While dealing with branded links during my course of work, I kind of wondered how it can be tackled if I were to do it in a Google Kubernetes Engine Cluster. The situation I would imagine that would need to solve is this:\nThe application is to be deployed via Helm Chart Maybe due to legal/business reasons, it has been arranged such that 1 copy of the application will only 1 customer. So if we have 5 customers, we would need to deploy the above Helm Chart 5 times with different configurations that are accustomed to each customer. We don\u0026rsquo;t want to handle too many clusters as it will cause too much overhead for maintainence and cost of running 1 cluster per customer (we would also need to manage monitoring etc). It would be best to deploy all those applications to 1 single cluster and then have some sort of mechanism to redirect customer to their specific pod on the cluster. Each customer will access the application with their own branded link. It has been previously arranged such that each customer can purchase their own domain and by accessing that domain, they should able to access the application deployed on that cluster. Deploying separate charts with different configurations # As an example, we can try deploying the following application from this repo: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicHelm\nThe reason for proposing to test with the following application is that we can alter the environment variable TARGET within the helm chart to simulate different configurations of the applications. The application will return the value of TARGET in its response - so by altering it, we can have the same application but its behaviour slightly made different.\nWe can build the docker image for this and prep it to be used by Google Kubernetes Engine by running the following commands:\n# Replace the XXXX with your own project ID docker build -t gcr.io/XXXX/yahoo:v1 . docker push gcr.io/XXXX/yahoo:v1 To change the config file - alter the value of the environment value of TARGET in the deployment.yaml file of the helm chart with the basic-app folder. Once done, we can deploy it via the following command:\n# Assume that the environment value of TARGET was changed to yahoo helm upgrade --install yahoo ./basic-app # Assume that the environment value of TARGET was change to lola helm upgrade --install lola ./basic-app This should hopefully get the application to run. The initial version of the chart sets the replica count to 5 - so maybe this might be too high for an example use case; you might want to adjust it to a lower replica count value.\nkubectl get pods Response of the above command:\nNAME READY STATUS RESTARTS AGE lola-basic-app-6957f859cf-4cw98 1/1 Running 0 3h41m lola-basic-app-6957f859cf-85n5d 1/1 Running 0 3h41m lola-basic-app-6957f859cf-gqqjp 1/1 Running 0 3h41m lola-basic-app-6957f859cf-jz5zn 1/1 Running 0 3h41m lola-basic-app-6957f859cf-mnsnm 1/1 Running 0 3h41m yahoo-basic-app-7bfcc945d7-65tkf 1/1 Running 0 4h3m yahoo-basic-app-7bfcc945d7-j2br9 1/1 Running 0 4h3m yahoo-basic-app-7bfcc945d7-pzjlr 1/1 Running 0 4h3m yahoo-basic-app-7bfcc945d7-qljcm 1/1 Running 0 4h3m yahoo-basic-app-7bfcc945d7-snxhf 1/1 Running 0 4h3m To check that the application is somewhat running properly, we can \u0026ldquo;exec\u0026rdquo; into a container and run some commands to make sure it kind of works (don\u0026rsquo;t forget that the application deployed here allows one to do so. Other production ready applications out there are usually configured to not allow this to improve the security posture of deploying such containerized applications.)\nkubectl exec -it yahoo-basic-app-7bfcc945d7-snxhf -- /bin/bash Once inside the container, we can run the following command:\ncurl localhost:8080/ We would probably get the following response:\nroot@yahoo-basic-app-7bfcc945d7-snxhf:/home# curl localhost:8080 Hello World: Yahoo! With that, we can proceed to the next step of externalizing the application.\nSingle entry point # There many ways to fulfil the above situation. One way is to create 1 load balancer per customer in Google Cloud. Unfortunately, doing that might not be most wise - having 1 load balancer is already above $10-$15 (may vary). The price overhead might be a little high if we are to run it by that approach.\nAnother approach is to use Kubernetes Ingress - under the hood, it would set up a single Load Balancer and that single load balancer will be used to set up the various incoming traffic (ingress) rules and how to treat each incoming traffic.\nYou can see an example of such an ingress (We can have multiple ingress definition yaml files - Kubernetes is able to combine them). The following ingress file is used to define 2 branded domains that customers can access.\n# Save the file as all-ingress.yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: test-ingress spec: rules: - host: yahoo.example.com http: paths: - path: / pathType: Prefix backend: service: name: yahoo-basic-app port: number: 8080 - host: lola.example.com http: paths: - path: / pathType: Prefix backend: service: name: lola-basic-app port: number: 8080 We get the services of where to send the traffic from ingress by inquiring the services on Kubernetes\nkubectl get services Response for the above command:\nNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.116.0.1 \u0026lt;none\u0026gt; 443/TCP 4h55m lola-basic-app ClusterIP 10.116.1.211 \u0026lt;none\u0026gt; 8080/TCP 3h53m yahoo-basic-app ClusterIP 10.116.6.9 \u0026lt;none\u0026gt; 8080/TCP 4h50m We will apply the ingress rules to the Kubernetes cluster by running the following command\nkubectl apply -f all-ingress.yaml Once applied, we can check the state of ingress.\nkubectl get ingress Note that for GKE, it will take above 2-5 minutes for IP to come up on Address field. Under the hood, GKE actually contacts Google APIs to create 1 load balancer on our behalf and then apply the rules to said Load Balancer. It would then attempt and work to ensure that ingress rules are synced up to the load balancer.\nNAME CLASS HOSTS ADDRESS PORTS AGE test-ingress \u0026lt;none\u0026gt; yahoo.example.com,lola.example.com 34.111.138.89 80 175m Testing # In order to test this out, we can then add the address of the ip address that was assigned for this to our /etc/hosts file (not sure if there is anything similar for windows. We\u0026rsquo;re trying to assign an ipaddress to a domain name)\n34.111.138.89 yahoo.example.com 34.111.138.89 lola.example.com Append the above to the mentioned file. Be careful when handling this file, other programmes could have added entries to the file and manipulating the /etc/hosts file without any care may cause issues with said programmes. You can try avoiding this by making a backup of the /etc/hosts file just in case. If things go south, you can replace the altered /etc/hosts with the backup file.\nWe can now run curl on domains yahoo.example.com and lola.example.com and it should be able to return an expected response from the deployed application\ncurl yahoo.example.com # Response # Hello World: Yahoo! curl lola.example.com # Response # Hello World: Lola! ","date":"13 February 2022","externalUrl":null,"permalink":"/kubernetes-ingress-for-applications-with-branded-links-on-gke/","section":"Posts","summary":"While dealing with branded links during my course of work, I kind of wondered how it can be tackled if I were to do it in a Google Kubernetes Engine Cluster. The situation I would imagine that would need to solve is this:\n","title":"Kubernetes Ingress for applications with branded links on GKE","type":"posts"},{"content":"This is a quick sample tool to retrieve bus arrivals in Singapore. In order to use it, we would need to find for the Bus Stop ID or Bus Stop Code from where we\u0026rsquo;re taking the bus from. After keying it, it would fetch the records from LTA Datamall\u0026rsquo;s real time bus arrival API and present those records in this tool.\nThe Bus Stop IDs/Codes that you\u0026rsquo;ve keyed in here will be stored within the browser via \u0026ldquo;localstorage\u0026rdquo; - a refresh will wipe all the bus arrival times (We would would want more updated records). However, a refresh button is available on per bus stop to refetch and repopulate the records. See the tool below here.\nIf you\u0026rsquo;re testing the following tool in \u0026ldquo;out of bus services\u0026rdquo; hours, you can try using Bus Stop IDs 99999, 99998 and 99997. They are sample, \u0026ldquo;fake\u0026rdquo; bus stops and I\u0026rsquo;m not aware if such bus stops are actually real.\nThe tool may have some bugs but considering that it\u0026rsquo;s built over a weekend; once I have a bit more time to spare, I\u0026rsquo;ll consider fixing up the bugs and improving the look in the following weeks.\nBuilding the Bus Arrival tool - Overview # There are multiple parts to the building of the Bus Arrival tool; we would first need to build a backend. The backend would serve to retrieve the records from lta datamall and is done to somewhat protect the lta datamall secrets that we are using to retrieve the records. Out of familiarity, I\u0026rsquo;m building it using Golang as I\u0026rsquo;m most familiar with it and it\u0026rsquo;s considered the easiest for me to maintain as there is static typing in place. I\u0026rsquo;ll explain more about it in the next section.\nSince we\u0026rsquo;re embedding the frontend into this blog post, we can rely on Elm (as there are already processes/code in place to make this happen more easily). I\u0026rsquo;ll probably be a heavy user of the tool so I\u0026rsquo;ll want certain features such as making sure that bus stop IDs are saved to ensure that I don\u0026rsquo;t need to keep looking up on the bus stops IDs that are generally frequent to.\nBuilding the backend of Bus Arrival tool - Golang backend # You can reference the Golang code being referenced here in the following link:\nhttps://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Apps/bus-arrival\nThe following Bus Arrival embedded web application is built using Golang. The first thing to immediately consider when building the backend is to figure how to access the LTA datamall APIs. There is a guide to its usage here: https://datamall.lta.gov.sg/content/dam/datamall/datasets/LTA_DataMall_API_User_Guide.pdf\nIn order to get the account key, we would need to signup and request for an account key from LTA Datamall\nWe can probably quickly that the API works it with curl:\ncurl -H \u0026#39;AccountKey: XXXX\u0026#39; http://datamall2.mytransport.sg/ltaodataservice/BusArrivalv2?BusStopCode=83139 Do make sure that the service is not undergoing any downtime while testing this. But then again, most of the time, the downtime should only happen in the middle of the night.\nIn order to access the LTA Datamall APIs via Golang, we would need to create the required structs in order to parse the responses return from the LTA Datamall endpoint. Either that, or rely on a Golang library that just happens to do that heavy lifting. Refer to the following Golang library that provides that functionality: https://github.com/yi-jiayu/datamall\nThe rest of the concerns for the service is somewhat settled; the main important bit is just accessing, parsing and simplifying the responses that are returned from the Bus Arrival endpoint of the LTA Datamall APIs. The API is built with Gorilla Golang library which provide an easy way to do routing for such REST APi services. We would also provide some sort of \u0026ldquo;healthz\u0026rdquo; behind the /api/.. route so that it would easy to test and check for frontend code. (To check for accessibility of the route)\nDo take note of the fact that the Golang code here DOES NOT have any CORS configured for it. I will expand a bit on that point in the next section on the frontend building of this Bus Arrival tool.\nAnother interesting point is how the api is structured. We are now using /api/lta-datamall/v1/bus-arrival although generally, I like to use /api/v1/bus-arrival - the version right after api since apis should have versioning for our sanity. Reason for this is that I need to proxy it during deployment and it\u0026rsquo;ll be kind of hard to proxy just based on the latter api convention. With the lta-datamall, I\u0026rsquo;m kind of proxying based on \u0026ldquo;service\u0026rdquo; rather than just based on \u0026ldquo;service endpoint\u0026rdquo; - not sure how to kind of explain it here\u0026hellip;\nBuilding the frontend of Bus Arrival tool - Elm # I really hate tinkering around with CORS. There are too many weird quirks that I kind of need to know. And the thing is, I generally don\u0026rsquo;t handle frontend most of the time. During the first time encountering it, I took so many hours trying to debug it - the behaviour is different across different web browsers, and across different versions of the said web browsers. You can probably read further on this in another post. CORS with Golang Microservices and Elm Frontend is difficult\nAs mentioned in the above blog post previously, I\u0026rsquo;ve decided to go with the approach of avoiding CORS if I can somehow to do so. As per the previous blog post, I\u0026rsquo;m going with the approach where my Golang backend is exposed on a docker image on port 8880 while my Elm frontend is exposed on port 8000. The two services are proxied via nginx installed on my workstation via port 8080. Because it is from the same domain and same port - CORS restrictions doesn\u0026rsquo;t kick in, so I don\u0026rsquo;t need to handle CORS issues.\nThe only thing that is different would be the portion that nginx doesn\u0026rsquo;t automatically handle query arguments when doing proxy pass. We would need to construct the full url including the query arguments by adding $is_args$args:\n... location ~ ^/api/(.*) { proxy_pass http://localhost:8880/api/$1$is_args$args; } ... Without that, query arguments will be silently dropped - and that\u0026rsquo;s definitely an issue here.\nWe can then access and test the frontend by accessing from port 8080 - which is the port that nginx is using. The frontend will call backend using port 8080 which would forward said request to port 8880 accordingly.\nDeploying the application # Backend will be deployed to Google Cloud Run and we would using Google Cloud Secrets Manager. The procedure was done manually (no Google Cloud Build was used here). Image that is used to run the app on Google Cloud Run will be stored on Google Container Registry. In case the names of those products are a bit to vague for you:\nGoogle Cloud Run: One of Google Cloud Platform\u0026rsquo;s serverless platform. TLDR - docker as a service. Build an image, push it to a container registry and then tell this product to run that image for you. It comes with plenty of limitations but it should be sufficient for quite a fair bit of use cases (since most use cases are just simple CRUD applications) Google Secrets Manager: As the name implies, it is just a UI tool to manage secrets that you would inject into servers/containers on Google Cloud Platform. In order for the running servers and containers to access the secrets, the service account in charge of said server/running container needs to be configured to be able to access the said secret. Google Cloud Build: One of Google Cloud Platform\u0026rsquo;s CI/CD tools. (The number of such tools are increasing every year - recently - as of the date of writing, there is Google Cloud Deploy as well, which has a different focus). The Google Cloud Build tool is essentially like \u0026ldquo;jenkins\u0026rdquo; but more restricted and focused by using docker images to build the necessary artifacts and deploy said artifacts to production. Google Container Registry: As the name implies, it is a registry that stores container images. There is another product: Google Artifact Registry which is more \u0026ldquo;generalized but its unfortunately, way more expensive. The first part before deploying would be to build the image that we would run for this Bus Arrival app:\ndocker build -t gcr.io/\u0026lt;project id\u0026gt;/bus-arrival:v1 . After which, we can push it to Google Cloud Container Registry:\ndocker push gcr.io/\u0026lt;project id\u0026gt;/bus-arrival:v1 The next parts are kind of UI based which is to:\nRegister the LTA Datamall secret into Google Secrets Manager Create a new Google Cloud Run service and select the newly pushed image Ensure that the number of concurrent images running is low (we don\u0026rsquo;t expect large number of requests for this) Add the secret that is to be used in the app (it should be under secrets tab) and choose the right version of the secret from Secrets Manager that should be used. Ensure right amount of CPU/Memory is used - which, in our current case, should also be quite low. Deploy and run the service We can check that the application work by pinging the healthz endpoint:\ncurl \u0026lt;cloud run endpoint\u0026gt;/api/lta-datamall/v1/healthz That endpoint is there to check that endpoint is accessible.\nThe next part is the critical bit - remember the bit that the backend doesn\u0026rsquo;t support CORS? This would come back here. We need configure the backend such the backend point exists on the SAME endpoint as the frontend point. Which in our case is the hairizuan.com. How should we do this? Do we do it via Nginx?\nA bit of context, this blog post is deployed on Netlify - so we can\u0026rsquo;t exactly use nginx since the domain itself is managed by Netlify. Unless I move away from this, then, the Nginx (similar to how to test this whole tool locally) can then be used. Luckily, Netlify has the following - a proxy: https://docs.netlify.com/routing/redirects/rewrites-proxies/\nTo make it work, we need to add the following to the netlify.toml of the current repo:\n[[redirects]] from = \u0026#34;/api/lta-datamall/*\u0026#34; to = \u0026#34;https://\u0026lt;cloud run endpoint\u0026gt;/api/lta-datamall/:splat\u0026#34; force = false status = 200 The important part here is the status field - it should not be 301. If 301 - frontend will experience CORS issue once more since the frontend is trying to retrieve resources from another domain.\nConclusion # This is a pretty fun project; but it\u0026rsquo;s also a project that I was pondering around for quite a while. This kind of came up from the fact that I do want to check bus arrival times but I\u0026rsquo;m too lazy/cautious to download any application to do this (I\u0026rsquo;ve tried one or two apps for this before but it was so buggy and doesn\u0026rsquo;t fit my small little requirement - too many buttons/fields to pass in just to get the information).\nAs a final touch, I would create a shortcut for this on my phone and now, it becomes somewhat \u0026ldquo;app-like\u0026rdquo;; a small tool that I can quickly access to get the information that I need.\n","date":"3 February 2022","externalUrl":null,"permalink":"/bus-arrival-app-singapore/","section":"Posts","summary":"This is a quick sample tool to retrieve bus arrivals in Singapore. In order to use it, we would need to find for the Bus Stop ID or Bus Stop Code from where we’re taking the bus from. After keying it, it would fetch the records from LTA Datamall’s real time bus arrival API and present those records in this tool.\n","title":"Bus Arrival App - Singapore","type":"posts"},{"content":"Database migration is kind of a critical bit when it comes to running and operating applications. In Golang, it is kind of appealing to rely on ORM (Object Relational Mapping) libraries. It allows one to kind of map structs to tabular structures within the database storage. One such example of an ORM library that I\u0026rsquo;ve found on the first page of Google is GORM.\nThe Gorm package allow application developers to mostly focus on application logic and move some of the \u0026ldquo;administrative\u0026rdquo; stuff of reading data from cursors being returned from database responses into Golang structs. The amount of effort to do this is the reason why ORMs continue to exist - despite the various negatives from the usage of the libraries. As of now, essentially, as long as a developer understand the limitations of such libraries and how functions within the ORM translate into SQL queries - the library is a useful tool in a developer arsenal.\nHowever, one such feature that is \u0026ldquo;appealing\u0026rdquo; to use but definitely bad to have is the auto-migrate in GORM. Refer to the following page: https://gorm.io/docs/migration.html\nThe auto migrate is extremely awesome to use - we can easily create the sql statements that would be able to create the necessary tables and columns that is needed for application to run. It is nice to use it for bootstraping applications and to get the database tables so that we can focus on writing application logic rather than bother about the administrative effort to write proper migration scripts.\nHowever, as one would guess by now, this convenience comes at a cost. Auto migrate doesn\u0026rsquo;t seem to track the version of database schema that our application is using. We have little control of knowing how the the auto-migrate feature would alter the database tables. Although in a large number of cases, we would only add database columns etc - however, we also need to take note that this lack of control could lead to inconsistent database migrations. In the case where we need to apply database migrations across multiple datacentres across the time where the application is needed to be operated in - the database structure would be inconsistent - and this inconsistency could easily open the door to potential bugs where a bug would only exist in certain datacentres (maybe due to accidental reference to a column that used to be needed but application structs have been altered such that auto-migrate would not create that column).\nIn this blog post, I would be covering a potential library/approach to database migration using the golang-migrate library. It will be demonstrated using Cloud SQL (which will be accessed via Cloud SQL Proxy) and the application is being hosted in a Google Compute Engine Virtual Machine.\nCreating the environment # First, let\u0026rsquo;s get a Google Compute Engine VM. Please make sure to enable CloudSQL api for the Virtual Machine\nAlso, in order for this whole thing to work, we would ensure the following API is enable for the Google Cloud Project: Cloud SQL Admin API\nThis section will be covering on installing on mysql client - which we may/may not need here - mostly used for debugging purposes only. We can probably skip this step safely.\nsudo apt update \u0026amp;\u0026amp; sudo apt install -y wget wget https://dev.mysql.com/get/mysql-apt-config_0.8.20-1_all.deb sudo dpkg -i mysql-apt-config_0.8.20-1_all.deb sudo apt update sudo apt install -y mysql-client This step would be needed - install the cloud sql proxy. Cloud SQL proxy is needed in the case where we are not having the Cloud SQL restricted to only one VPC. If we need that same database to be accessed from compute engine VMs from multiple VPCs.\nsudo apt update \u0026amp;\u0026amp; sudo apt install -y wget wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 -O cloud_sql_proxy sudo chmod +x cloud_sql_proxy With that, we now have the VM somewhat prepped up. The next step would to just to create a Cloud SQL - MySQL instance. We would using a \u0026ldquo;public\u0026rdquo; connection instead of \u0026ldquo;private\u0026rdquo;. We would also need to create the database within that MySQL instance manually. After that, we can then copy the sql connection name from the overview of the Cloud SQL instance and paste it into our VM for our auth proxy to work.\n./cloud_sql_proxy -instances=\u0026lt;sql connection name\u0026gt;=tcp:3306 As one can guess, the cloud sql proxy that we download and start running is just a binary that forwards/\u0026ldquo;proxies\u0026rdquo; the traffic meant for the database to the Cloud SQL instance. We need to use that to ensure that the database connection is secure and authorized - with that, we can just use the simple way to connect to mysql (as how most tutorials do it). We don\u0026rsquo;t need to concern ourselves to secure it etc. The proxy allow us to just send data to \u0026ldquo;localhost:3306\u0026rdquo; even though our database instance is definitely not being on that machine.\nTo test this out, we can run it in shell but I\u0026rsquo;d imagine that the better way to manage this would be to put the cloud_sql_proxy binary into systemctl\u0026rsquo;s control. This is so that we can ensure that the binary will be restarted accordingly should it the application crash etc.\nRunning migrations # There are multiple ways of doing this; one way is to download the golang-migrate CLI tool.\ncurl -L https://github.com/golang-migrate/migrate/releases/download/v4.15.1/migrate.linux-amd64.tar.gz | tar xvz An important thing to take note is it would best if we ensured that migrations are all idempotent - this would mean that if migrations were to be accidentally \u0026ldquo;rerun\u0026rdquo;, it should not mess up the database schema. An upgrade/downgrade of the database schema shouldn\u0026rsquo;t impact the running of the application as much as possible so that would ensure that we would not need to introduce downtime just in order to be able to upgrade the application or run database schema upgrades.\nWith the golang migrate tool, we can upload the sql migration scripts in order to alter the database scheme accordingly. We can run the migrate command on top of all the sql migration scripts. The problematic thing is that using this methodology, we would need to sync the sql migrations scripts over or provide some sort of online link to said migration scripts (which actually feels less secure here) and that does kind of create an additional administrative step to handle all that.\nAn alternative to using the golang-migrate tool as CLI directly is to embed into the application that we\u0026rsquo;re building. I have a reference aplication here: https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicMigrate\nIn this application, we are relying on Golang\u0026rsquo;s capability to embed files into the built binary (previously, you would have to somehow get some sort of package that would do this but now, this is natively supported). The embed feature is available from Golang 1.16 onwards. Refer to the release notes for it here: https://go.dev/blog/go1.16\nThe Go migrate package now supports this feature - we just need to import said packages into our binary as well:\n\u0026#34;github.com/golang-migrate/migrate/v4/source/iofs\u0026#34; This is the package that would be needed to support the portion of being able to use the embedded files for sql migration by the go-migrate package.\nThe sample binary here is built with 2 subcommands. One subcommand is the migrate subcommand which serves to invoke the functions to run the migration. The one being built here is on the \u0026ldquo;simpler\u0026rdquo; side where we would do migration straight to the latest schema but I\u0026rsquo;d imagine that it could be possible where we can possible control the number of migrations to run/upgrade upwards - this might be needed in the scenario where we have a extremely old version of the application running in some datacentre; we can sync latest version of the application; open up a maintainance window and upgrade the schema one upgrade at a time to the latests (if there was issues, we can pause it there and identify the troubling migration version)\nCode for reference application # Since the application is still somewhat simple, I can still add it to this blog post with little issue. However, updates won\u0026rsquo;t be propagated here so it would be best to refer to the github link that contains the source code that is being referred to discuss on this topic.\npackage main import ( \u0026#34;encoding/json\u0026#34; \u0026#34;fmt\u0026#34; \u0026#34;io/ioutil\u0026#34; \u0026#34;log\u0026#34; \u0026#34;net/http\u0026#34; \u0026#34;os\u0026#34; \u0026#34;strconv\u0026#34; _ \u0026#34;github.com/go-sql-driver/mysql\u0026#34; \u0026#34;github.com/gorilla/mux\u0026#34; gormMySQL \u0026#34;gorm.io/driver/mysql\u0026#34; \u0026#34;gorm.io/gorm\u0026#34; \u0026#34;embed\u0026#34; migrate \u0026#34;github.com/golang-migrate/migrate/v4\u0026#34; _ \u0026#34;github.com/golang-migrate/migrate/v4/database/mysql\u0026#34; \u0026#34;github.com/golang-migrate/migrate/v4/source/iofs\u0026#34; \u0026#34;github.com/spf13/cobra\u0026#34; ) //go:embed migrations/* var fs embed.FS type User struct { ID int `gorm:\u0026#34;primaryKey,autoIncrement\u0026#34;` FirstName string LastName string } var rootCmd = \u0026amp;cobra.Command{ Use: \u0026#34;app\u0026#34;, Short: \u0026#34;This is a sample golang migrate application\u0026#34;, } var migrateCmd = \u0026amp;cobra.Command{ Use: \u0026#34;migrate\u0026#34;, Short: \u0026#34;Run database migration\u0026#34;, Run: func(cmd *cobra.Command, args []string) { d, err := iofs.New(fs, \u0026#34;migrations\u0026#34;) if err != nil { log.Fatal(err) } m, err := migrate.NewWithSourceInstance( \u0026#34;iofs\u0026#34;, d, \u0026#34;mysql://user:password@(localhost:3306)/application\u0026#34;) if err != nil { panic(fmt.Sprintf(\u0026#34;unable to connect to database :: %v\u0026#34;, err)) } m.Up() }, } type UserGet struct { DB *gorm.DB } func (h UserGet) ServeHTTP(w http.ResponseWriter, r *http.Request) { vars := mux.Vars(r) rawUserID := vars[\u0026#34;userID\u0026#34;] userID, err := strconv.Atoi(rawUserID) if err != nil { w.WriteHeader(http.StatusBadRequest) w.Write([]byte(\u0026#34;bad request\u0026#34;)) return } var u User result := h.DB.First(\u0026amp;u, userID) if result.Error != nil { w.WriteHeader(http.StatusInternalServerError) w.Write([]byte(\u0026#34;bad connection\u0026#34;)) return } rawResp, _ := json.Marshal(u) w.WriteHeader(http.StatusOK) w.Write(rawResp) } type UserCreate struct { DB *gorm.DB } func (h UserCreate) ServeHTTP(w http.ResponseWriter, r *http.Request) { raw, err := ioutil.ReadAll(r.Body) if err != nil { w.WriteHeader(http.StatusBadRequest) w.Write([]byte(\u0026#34;bad request\u0026#34;)) return } type userCreate struct { FirstName string `json:\u0026#34;first_name\u0026#34;` LastName string `json:\u0026#34;last_name\u0026#34;` } var uc userCreate json.Unmarshal(raw, \u0026amp;uc) u1 := User{FirstName: uc.FirstName, LastName: uc.LastName} result := h.DB.Create(\u0026amp;u1) if result.Error != nil { w.WriteHeader(http.StatusInternalServerError) w.Write([]byte(\u0026#34;bad connection\u0026#34;)) return } rawResp, _ := json.Marshal(u1) w.WriteHeader(http.StatusOK) w.Write(rawResp) } var serverCmd = \u0026amp;cobra.Command{ Use: \u0026#34;server\u0026#34;, Short: \u0026#34;Run server\u0026#34;, Run: func(cmd *cobra.Command, args []string) { fmt.Println(\u0026#34;server start\u0026#34;) dsn := \u0026#34;user:password@tcp(127.0.0.1:3306)/application\u0026#34; db, err := gorm.Open(gormMySQL.Open(dsn), \u0026amp;gorm.Config{}) if err != nil { panic(fmt.Sprintf(\u0026#34;unable to connect to database :: %v\u0026#34;, err)) } r := mux.NewRouter() r.Handle(\u0026#34;/user\u0026#34;, UserCreate{DB: db}).Methods(\u0026#34;POST\u0026#34;) r.Handle(\u0026#34;/user/{userID}\u0026#34;, UserGet{DB: db}).Methods(\u0026#34;GET\u0026#34;) srv := \u0026amp;http.Server{ Handler: r, Addr: \u0026#34;:8888\u0026#34;, } log.Fatal(srv.ListenAndServe()) }, } func init() { rootCmd.AddCommand(migrateCmd) rootCmd.AddCommand(serverCmd) } func main() { if err := rootCmd.Execute(); err != nil { fmt.Fprintln(os.Stderr, err) os.Exit(1) } } ","date":"17 January 2022","externalUrl":null,"permalink":"/database-migration-via-cloud-sql-proxy-for-cloud-sql-in-google-compute-engine-vm/","section":"Posts","summary":"Database migration is kind of a critical bit when it comes to running and operating applications. In Golang, it is kind of appealing to rely on ORM (Object Relational Mapping) libraries. It allows one to kind of map structs to tabular structures within the database storage. One such example of an ORM library that I’ve found on the first page of Google is GORM.\n","title":"Database migration via Cloud SQL Proxy for Cloud SQL in Google Compute Engine VM","type":"posts"},{"content":"I am still building up my personal pet project: https://github.com/hairizuanbinnoorazman/slides-to-video; the aim of this project is a personal one - to build up a set of microservices that is able to be deployed in various ways such as locally via Docker Compose or even to Kubernetes or the serverless Cloud Run platform on Google Cloud Platform. There was a previous blog post describing an initial part of this journey: Lessons on building the project - Part 1\nAs a reminder - the set of microservices/applications being build is here Slides to Video application - it takes in a PDF containing slides; user would be adding a script to each slide (which will be used to narrate that slide) and end of the process, the user would be able to download a video file that is fully voiced over.\nThe architecture for this can be quickly summed to the following:\nFrontend (Elm) API Server - Manager (Golang) PDF Splitter Worker (Golang) Image to Video Worker (Golang) Video Concatenation Worker (Golang) Manager work with the worker by send messages to a queue system which depends on how it is deployed. If docker-compose within the repo was used, it would be using Nats.\nMigrating from JWT Tokens to Cookies # Initially, the application was built to use JWT Tokens to pass the authentication token from backend to frontend. It does seem like the modern way of doing things nowadays - everytime a mention of single page applications (SPA) come along, the JWT token would be mentioned there. Fyi, JWT means JSON Web Token; it\u0026rsquo;s an open standard which defines a way to send data (usually authorization data) to and fro the server. Best to refer to the JWT Website\nThe usage of JWT went fine until I needed to get some images that restricted to specific users. Generally, when we request for such resources, we would usually do it via GET requests - especially in the case of getting images from a server. If the website sets cookies for the site, it would send cookies along with every requests. Unfortunately, this is not the case for sending of JWT Tokens. Headers are not automatically sent with every request. If one is to check that \u0026lt;img\u0026gt; html tag - you will not find any possible way to modify the way it request images from the server. This makes it quite difficult to ensure that only authenticated users get to retrieve the images for that specific user.\nIn order to try to manage this situation, I thought of having the Elm application having the functionality to download images and prep them in the app state and render them in the page once downloaded. However, this does feel like hugely unnecessary complexity introduced to the applications just to follow the \u0026ldquo;JWT\u0026rdquo; approach of handling authorizations on the frontend. That\u0026rsquo;s not the only issue though; the elm-image library doesn\u0026rsquo;t exactly make the whole experience an easy, pleasant journey: https://github.com/justgook/elm-image/issues/9. If we had just use cookies, we wouldn\u0026rsquo;t even bother to add the authorization headers for every request - the required tokens will be sent automatically, which would considerably make the frontend code way simpler.\nWith that, I decided to make the move to utilize cookies for sending tokens to and fro between frontend and backend. However, there is some communication between the various microservices to the API server and for that, JWT tokens will be the ideal way to pass the data between the microservices (to reduce the need for the microservices to keep checking the database if user is authorized to do that action).\nIssues with CORS and cookies # In order to allow the Elm frontend to communicate with the golang backend, we would first need to enable CORS in the backend. In the gorilla set of libraries, it is possible to provide the CORS capability. A initial version of configuration that somewhat work (just to be able to have the frontend communicate with the backend) is something like this:\nImportant note here that even though both backend and frontend are on the localhost domain - they\u0026rsquo;re both using different ports. Apparently, this diffence is enough for browser to distinguish them as \u0026ldquo;different domains\u0026rdquo; which would require us to initially set up this CORS mechanism.\ncors := handlers.CORS( handlers.AllowedHeaders([]string{\u0026#34;Content-Type\u0026#34;, \u0026#34;Authorization\u0026#34;}), handlers.AllowedOrigins([]string{\u0026#34;*\u0026#34;}), handlers.AllowedMethods([]string{\u0026#34;GET\u0026#34;, \u0026#34;POST\u0026#34;, \u0026#34;PUT\u0026#34;}), ) This is where problems start to arise. With the above configuration, the browser recognizes this as a \u0026ldquo;unsafe\u0026rdquo; configuration and will not set the cookies which is sent from the backend. One of the main troubling configuration that is considered bad is the * configuration for AllowedOrigins. That is understandable - ideally, backend should not trust all frontends from other domains that is trying to reach the server.\nI attempted this configuration instead - apparently, a forum post in one of the stack overflow mention about the need to set the Access-Control-Allow-Credentials to be true as well - so, I added that setting as well.\ncors := handlers.CORS( handlers.AllowedHeaders([]string{\u0026#34;Content-Type\u0026#34;, \u0026#34;Authorization\u0026#34;, \u0026#34;Set-Cookie\u0026#34;}), handlers.AllowedOrigins([]string{\u0026#34;http://localhost:8000\u0026#34;}), handlers.AllowedMethods([]string{\u0026#34;GET\u0026#34;, \u0026#34;POST\u0026#34;, \u0026#34;PUT\u0026#34;, \u0026#34;OPTIONS\u0026#34;}), handlers.AllowCredentials(), ) Other settings I\u0026rsquo;ve remembered reading involved the cookie settings. This was done with reference to the following post: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie. It mentions that browsers now only \u0026ldquo;sets the cookie\u0026rdquo; for websites where frontend and backend is of different domain by having the SameSite setting to None. However, in order for the browser to safely accept that, the Secure setting of the cookie should also be set to true. However, based on the Secure setting - it mentions that the cookie is only sent if frontend is using https protocol. It sounds too troublesome just for local development. It make sense in production settings to have all these settings but requiring all this weird config settings on dev makes it extremely unappealing to try to develop it locally.\ncookie := \u0026amp;http.Cookie{ Name: h.Auth.CookieName, Value: encoded, Path: \u0026#34;/\u0026#34;, Secure: true, HttpOnly: true, Domain: \u0026#34;localhost\u0026#34;, SameSite: http.SameSiteNoneMode, } Unfortunately, even after tinkering with all these settings - I still couldn\u0026rsquo;t get the cookie to be set on the frontend site; and I definitely don\u0026rsquo;t want to go hack around just to get it working locally. It would be ideal if this mode is available \u0026ldquo;out of the box\u0026rdquo; without huge changes to the codebase to allow for it to work in local environment.\nLuckily, the accepted answer in this stackoverflow question hinted on a possible escape hatch that we can try out that make it easier to handle this situation: https://stackoverflow.com/questions/46288437/set-cookies-for-cross-origin-requests\nProxy to both backend and frontend # As mentioned in the previous section of this blog post - if ports are different, the browser takes that as \u0026ldquo;different\u0026rdquo; origins. So, if we somehow manage proxy requests, maybe via nginx to both frontend and backend via localhost:8080 - we wouldn\u0026rsquo;t need to handle CORS configuration.\nFirstly, we would need to understand how I usually develop this locally:\nAPI server + other backend workers + database (mysql) + queue server (Nats) - are all setup via docker-compose. They exposed on specific ports. API Backend is exposed on local workstation port 8880. For development of frontend, I generally just stick to vanilla elm reactor. Usual port for this is exposed on port 8000. Initially, I wanted to get the proxy as a nginx container as part of the docker-compose setup but a quick think regarding that setup kind of automatically rule that out. If we are to do it by adding the proxy to the docker-compose setup - the main question is on how to get the proxy to sent traffic via the proxy container back out to the local workstation on a specific port. (Requires some docker networking magic to make that work.)\nThe easiest way to get this whole setup working is to just install nginx on the workstation, and then add the nginx rules to redirect to the elm reactor exposed port as well as the exposed ports of the backend api server. The rules should all fall under the same \u0026ldquo;server\u0026rdquo; construct in nginx.\nhttp { ... server { listen 8080; server_name localhost; ... # This is for slides to video api location /status { proxy_pass http://localhost:8880/status; } location ~ ^/api/(.*)$ { proxy_pass http://localhost:8880/api/$1; } location / { proxy_pass http://localhost:8000; } ... } } Implications to deployment # Of course, there are impacts from deciding to go down the route of attempting not to deal with CORS. It would mean that the backend and frontend has to be deployed and exposed to the same domain - I would definitely need some sort of proxy for this.\nThe simplest case would be to deploy everything into a single VM and set the domain to access the frontend from the public IP address of the VM. The Elm application has to be translated back into html, css and javascript and a http server is definitely needed to expose this.\nIn the case where if this application is to be deployed on a Cloud Environment and the applications are to be deployed on separate VMs, a Load Balancer (which is now a pretty common tool) can be used. Different paths can be set to sent the \u0026ldquo;/\u0026rdquo; path to frontend and for paths that start with \u0026ldquo;/api\u0026rdquo; to be sent to backend.\nIn the case where this application is to be deployed to Kubernetes instead; the application can be deployed into a single domain by making use of Kubernetes ingress. Similar to the case for the load balancer in a Cloud Environment, with the Kubernetes ingress, we can map certain paths to send traffic to frontend and the rest to be sent to backend.\n","date":"2 January 2022","externalUrl":null,"permalink":"/cors-with-golang-microservices-and-elm-frontend-is-difficult/","section":"Posts","summary":"I am still building up my personal pet project: https://github.com/hairizuanbinnoorazman/slides-to-video; the aim of this project is a personal one - to build up a set of microservices that is able to be deployed in various ways such as locally via Docker Compose or even to Kubernetes or the serverless Cloud Run platform on Google Cloud Platform. There was a previous blog post describing an initial part of this journey: Lessons on building the project - Part 1\n","title":"CORS with Golang Microservices and Elm Frontend is difficult","type":"posts"},{"content":"","date":"2 January 2022","externalUrl":null,"permalink":"/categories/hugo/","section":"Article Categories","summary":"","title":"Hugo","type":"categories"},{"content":"","date":"2 January 2022","externalUrl":null,"permalink":"/tags/hugo/","section":"Technology Tags","summary":"","title":"Hugo","type":"tags"},{"content":"While building Elm based frontends, I decided to take the opportunity to learn on how to craft a chat application. Truthfully, I\u0026rsquo;ve never really built one before (nor do I need to). But it does seem like an interesting programming exercise to kind of go thru - in order to understand how such applications are built, deployed, scaled and managed. For the frontend, I\u0026rsquo;m mostly set to use Elm (probably you\u0026rsquo;ve seen a previous post on my \u0026ldquo;dislike\u0026rdquo; for other Javascript based frameworks, which is essentially all the popular ones in the market). For backend, I will probably stick to Golang since that is the language I\u0026rsquo;m most comfortable with (all hail statically typed languages)\nThe backend code base for the chat application can be found here:\nhttps://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicWebsocket\nThe backend code is modified from the example:\nhttps://github.com/gorilla/websocket/tree/master/examples/chat\nFor the codebase, we\u0026rsquo;re heavily using various components from the gorilla golang libraries - including the gorilla/mux library and the gorilla/securecookie library. You\u0026rsquo;ll probably understand why those libraries are used as you continue reading on this blog post.\nThe main aim of this sample chat application is to be able to create some sort of Elm component that can be embedded into a html page. Unfortunately, I will not be making a this Elm component to be embedded in this blog post as the it feels like there isn\u0026rsquo;t enough features to showcase it (although, there are plenty of interesting items that one can come around during development work). Most likely, in the futre, there will be another blog post that will include the demonstration as well as other interesting features I intend to add to such sample application (e.g. making chat messages persistent, allow creating of multiple chat rooms etc).\nThe rest of the sections of the blog post will not be covering on each detail in the codebase but instead will be covering on the more interesting aspects of the codebase. Most of this sections are the parts where I kind of tripped over while building the Golang application backend.\nAdding CORS # The first step is to make it possible for our Elm component to talk to our backend. This is done via CORS (Cross Origin Resource Sharing). This happens due to the domain of frontend is different as compared to the backend - hence, by default, it shouldn\u0026rsquo;t be trusted. A reminder here that the frontend is build with Elm and is injected to a html page as a Single Page Application (SPA)\nIn Golang, we can easily resolve this by importing some sort of CORS library. Refer to the codebase highlighted below - you can find this in the main.go file of the folder of the repo.\nimport \u0026#34;github.com/rs/cors\u0026#34; ... c := cors.New(cors.Options{ AllowedOrigins: []string{\u0026#34;*\u0026#34;}, AllowedMethods: []string{\u0026#34;*\u0026#34;}, }) ... Do note that the configuration here is \u0026ldquo;very bad\u0026rdquo; - essentially, an \u0026ldquo;allow all\u0026rdquo; kind of configuration. For testing purposes, it may be ok but we definitely need to clamp down on what origins can contact the server and what methods it can use to access as well. We would definitely need to ensure that only the right frontend can access the backend.\nChecking frontend while creating websocket connection # There is no \u0026ldquo;protection\u0026rdquo; mechanism to prevent the browser from making unauthorized access to any server. Any of such protection mechanism has to be implemented on the backend - which is our Golang server.\nBy default, the library being used here \u0026ldquo;gorilla/websocket\u0026rdquo;, will at least minimally ensure that the frontend calling the backend is at least of the same domain. In order to accomodate the usage of Elm, we would need to add the following modification:\nvar upgrader = websocket.Upgrader{ ReadBufferSize: 1024, WriteBufferSize: 1024, CheckOrigin: func(r *http.Request) bool { _, err := r.Cookie(\u0026#34;cookie-name\u0026#34;) if err != nil { return false } return true }, } The important function (without which, we can\u0026rsquo;t use Elm to run our \u0026ldquo;chat\u0026rdquo; application) is the CheckOrigin function. If it is not modified, we would be seeing the following issue:\nupgrade:websocket: request origin not allowed by Upgrader.CheckOrigin There are a few approaches that we can use to check whether to allow the establishment of the websocket connection - one way is to check for the domain where this is request from etc. Another way that I thought that could be done is to pass sort of http header as we\u0026rsquo;re trying to establish the websocket connection - but it does seem like that approach may not be possible.\nThe easier way to pass all the extra data that is required to establish and initalize the websocket connection is via cookies (which it seems to be passed the moment the websocket is attempted to be established).\nThis was why we have a route to create the cookie (which we will describe in the next section)\nRefer to the following issue for more details of how to resolve the issue:\nhttps://github.com/gorilla/websocket/issues/367\nAdding route to create cookie # var hashKey = []byte(\u0026#34;very-secret\u0026#34;) var blockKey = []byte(\u0026#34;a-lot-secret\u0026#34;) var s = securecookie.New(hashKey, blockKey) type HomeHandler struct{} func (h HomeHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { value := map[string]string{ \u0026#34;foo\u0026#34;: \u0026#34;bar\u0026#34;, } if encoded, err := s.Encode(\u0026#34;cookie-name\u0026#34;, value); err == nil { cookie := \u0026amp;http.Cookie{ Name: \u0026#34;cookie-name\u0026#34;, Value: encoded, Path: \u0026#34;/\u0026#34;, Secure: true, HttpOnly: true, } log.Printf(\u0026#34;Cookie Generated :: %v\u0026#34;, encoded) http.SetCookie(w, cookie) } log.Println(\u0026#34;Home Handler endpoint reached\u0026#34;) w.WriteHeader(http.StatusOK) w.Write([]byte(\u0026#34;OK\u0026#34;)) } This section of code seem to showcase how we can attempt to create the cookie (which we kind of need to establish some sort of protection for our server when attempting to establish the websocket connection to our server.)\nElm code # I\u0026rsquo;m going to further develop it; so this will be snapshot of what that would work with the Golang server at this point of time.\nport module Chat exposing (..) import Browser import Html exposing (Html, button, div, input, li, text, ul) import Html.Attributes exposing (placeholder, type_, value) import Html.Events exposing (on, onClick, onInput) import Json.Decode as D main : Program () Model Msg main = Browser.element { view = view , init = \\() -\u0026gt; init , update = update , subscriptions = subscriptions } subscriptions : Model -\u0026gt; Sub Msg subscriptions _ = messageReceiver Recv port sendMessage : String -\u0026gt; Cmd msg port messageReceiver : (String -\u0026gt; msg) -\u0026gt; Sub msg type alias Model = { draft : String , messages : List String } init : ( Model, Cmd Msg ) init = ( Model \u0026#34;\u0026#34; [], Cmd.none ) type Msg = DraftChanged String | Send | Recv String view : Model -\u0026gt; Html Msg view model = div [] [ ul [] (List.map (\\msg -\u0026gt; li [] [ text msg ]) model.messages) , input [ type_ \u0026#34;text\u0026#34; , placeholder \u0026#34;Draft\u0026#34; , onInput DraftChanged , on \u0026#34;keydown\u0026#34; (ifIsEnter Send) , value model.draft ] [] , button [ onClick Send ] [ text \u0026#34;Send\u0026#34; ] ] ifIsEnter : msg -\u0026gt; D.Decoder msg ifIsEnter msg = D.field \u0026#34;key\u0026#34; D.string |\u0026gt; D.andThen (\\key -\u0026gt; if key == \u0026#34;Enter\u0026#34; then D.succeed msg else D.fail \u0026#34;some other key\u0026#34; ) update : Msg -\u0026gt; Model -\u0026gt; ( Model, Cmd Msg ) update msg model = case msg of DraftChanged draft -\u0026gt; ( { model | draft = draft } , Cmd.none ) Send -\u0026gt; ( { model | draft = \u0026#34;\u0026#34; } , sendMessage model.draft ) Recv message -\u0026gt; ( { model | messages = model.messages ++ [ message ] } , Cmd.none ) The important bit is embedding it into HTML. There is no proper support for websockets in Elm but we can have the Elm application interface with javascript via Ports. Refer to this on Elm documentation: https://guide.elm-lang.org/interop/ports.html\nThe Elm code here is almost the same as the one provided by the guide.\nFor the html piece - O will be providing what would be created if one would to add it to a hugo shortcode:\n\u0026lt;div id=\u0026#34;chat\u0026#34;\u0026gt;\u0026lt;/div\u0026gt; \u0026lt;script src=\u0026#34;{{ \u0026#34;toolsjs/chat.min.js\u0026#34; | relURL}}\u0026#34;\u0026gt;\u0026lt;/script\u0026gt; \u0026lt;script\u0026gt; app = Elm.Chat.init({ node: document.getElementById(\u0026#34;chat\u0026#34;) }); var socket = new WebSocket(\u0026#39;ws://localhost:8080/ws\u0026#39;); app.ports.sendMessage.subscribe(function(message) { socket.send(message); }); socket.addEventListener(\u0026#34;message\u0026#34;, function(event) { app.ports.messageReceiver.send(event.data); }); \u0026lt;/script\u0026gt; We would load our generated javascript of our Elm codebase and have that load up our generated javascript and start an app. We would then create a websocket which would then interact with the app - as messages go in and out of the app - the string will be fed into application which would then be rendered onto the screen.\nThe following Elm application is pretty simple and doesn\u0026rsquo;t take into question such as - in the case, we need to authenticate the user before establishing the websocket connection; how do we dynamically create the websocket only when we\u0026rsquo;ve checked within elm that the user has already \u0026ldquo;logged\u0026rdquo; in.\n","date":"20 December 2021","externalUrl":null,"permalink":"/build-chat-app-with-golang-websocket-and-elm-frontend/","section":"Posts","summary":"While building Elm based frontends, I decided to take the opportunity to learn on how to craft a chat application. Truthfully, I’ve never really built one before (nor do I need to). But it does seem like an interesting programming exercise to kind of go thru - in order to understand how such applications are built, deployed, scaled and managed. For the frontend, I’m mostly set to use Elm (probably you’ve seen a previous post on my “dislike” for other Javascript based frameworks, which is essentially all the popular ones in the market). For backend, I will probably stick to Golang since that is the language I’m most comfortable with (all hail statically typed languages)\n","title":"Build Chat App with Golang Websocket and Elm Frontend","type":"posts"},{"content":"BMI or Body Mass Index is calculated by taking one\u0026rsquo;s weight (in kilogram) and divided by the square of the height of the person (in metres). You can utilize the following tool below to quickly calculate this.\nThe are 4 states for BMI calculations; Underweight, Normal, Overweight and Obese.\nIf you were underweight, it would best to check your diet to ensure that your body is receiving sufficient nutrition to ensure a healthy body to prevent diseases such as nutritional deficiency or osteoporosis. Do seek medical advice if necessary.\nHowever, if you were overweight or obese, it is vital to begin to check diet and exercise to try to begin to lose weight over a time period. Being obese or overweight over long periods is worrisome - when you\u0026rsquo;re young, your health problems won\u0026rsquo;t be too obvious but it\u0026rsquo;ll worsen as time marches on.\nPrivacy Notice: The following tool will not record any details and is not sent to any server. All calculations are done within browser.\nThe statictics for diabetes looks relatively grim in Singapore; and it rose to the point where the country wanted to wage War on Diabetes. One of the factors to reduce the incidence of this is to ensure a healthy body weight - which is judged by calculating BMI (Body Mass Index)\nIn the case for Singaporeans (Asians), the ideal BMI currently is 23. Previously, it used to be 25 (just making use of research done on a mostly western audience), but further studies indicate of differnet fat/muscle composition differences between populations. Refer to some of the studies here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5571887/\nTechnical details of BMI tool # This section has nothing to do about BMI etc; it would be detailing how the BMI calculator is embedded into the following site.\nThe following BMI calculator on this page is built on ELM and then embedded into Hugo. Do refer to the details by checking the following page: ELM Frontend in Hugo Static Site\nIn the future, the tool would be improved to accept various weight/height in various units such as pounds or feet/inches.\n","date":"5 December 2021","externalUrl":null,"permalink":"/bmi-calculator/","section":"Posts","summary":"BMI or Body Mass Index is calculated by taking one’s weight (in kilogram) and divided by the square of the height of the person (in metres). You can utilize the following tool below to quickly calculate this.\n","title":"BMI Calculator","type":"posts"},{"content":"To view the Elm component in action - scroll down to the Elm Component Demonstration section\nMotivation # I wanted to learn to try to write a Frontend Application that provide some sort of dynamic functionality - e.g. doing quick mathematical calculation, fetching data from some backend APIs etc. However, the frontend world is a pretty complex world (and continues to be so to date) - there are many factors to take note when writing it:\nChoice of frontend language/framework Single Page Application? Or embed javascript to server rendered page? SEO considerations? Choice of frontend language # Regarding frontend language - we are usually restricted to using Javascript and the various frameworks built for it. The popular ones as at the time of writing is React, Vue and Angular. Fortunately and unfortunately, I did try coding using React and Vue; and they turn out to be way more complex than what I would like. Complexity is a big problem for me since I would mostly be coding backend/automation devops pipelines. I would only touch frontend code very rarely - so I need code which is kind of easy to read, have strong hints and easy to come back to. Both react and vue is hard the moment one requires to interact with some backend system - suddenly complex libraries that deal with state management app-wide (referring to redux and vuex here) suddenly come up - and those libraries are tremendously complex.\nAlso, an additional painpoint, there is also the webpack system that is kind of \u0026ldquo;needed\u0026rdquo; to ensure javascript is transpiled safely to a \u0026ldquo;compatible\u0026rdquo; version that works from the browsers that the application developer wishes to update. Although we can just rely on initial code generation tools (e.g. Create React App cli tooling) to generate initial version of webpack we may use, there is still an irksome feeling of using something that would require some configuration but its just something we just implicitly trust and leave alone because \u0026ldquo;it just works\u0026rdquo; at the moment.\nMost likely all the above \u0026ldquo;mini rants\u0026rdquo; is more of a personal taste - it could be mostly from a bad experience of working with some messy react and vue code - there could be a possibility that react and elm code is extremely elegant and very easy to understand.\nFor now, I currently write the frontend code in a programming language called Elm. It has pretty much a lot of nice properties but the main pulling factor is the debug messages that literally \u0026ldquo;scream\u0026rdquo; where the issue with the code is. Other nice properties include:\nVery descriptive error messages Static typing (At this point, I really hate coding in dynamic typing programming languages - with Static typing, you can actually avoid a whole class of coding issues and also, have decent type hinting in your respective IDE) Cannot \u0026ldquo;compile\u0026rdquo; code if there are bad logic implementations - e.g. the language attempts to force you to implement every branch/if condition where possible to ensure there is no \u0026ldquo;unexpected\u0026rdquo; view on frontend Simple toolchain; just elm cli. No webpack, gulp, grunt to configure Language kind of requires for functions that one writes to be \u0026ldquo;pure\u0026rdquo; - function that have 0 side effects - which makes code very testable and makes it easy to transpiler to optimize the code (removing unnecessary code is easy in the case of pure functions as compared code written with object oriented style). We can wonder why this is the case; you can just refer to the following repo: https://github.com/you-dont-need/You-Dont-Need-Momentjs You can probably read more of the benefits on the Elm Homepage\nThere are of course certain disadvantages:\nElm is less popular to React, Vue, Angular - so this would mean that there would be less guidance available on the internet to follow. Most likely you would need to rely on the reference library documentation in order to code things out rather than \u0026ldquo;copy and paste\u0026rdquo; from the various Stack Overflow posts or Medium Articles or Github Gists etc Elm is harder to get used to - its more of a functional programming style? I\u0026rsquo;m mostly used to object oriented programming where we create \u0026ldquo;object\u0026rdquo; structures in codebases and add properties and functions to said objects. However, everything in elm is a function, and all these functions are used to pipe their outputs into each other. SPA vs SSR # This is the one that befuddled me the most - deciding how the application is going to the internet.\nThe initial thought would be to code a \u0026ldquo;normal\u0026rdquo; Elm application. The Elm application would have capability the handle routing and other logic. All frontend based logic would be handled within the app itself (and this is the start of the problem)\nThe main concern mostly stem from following some of the usual SEO praticses out there:\nIf possible, ensure Server side render - so that bot from Google can actually pick up the page correctly when it is attempting to index. There could be a possibility that SPA takes a while to render and the indexing process might have accidentally take a \u0026ldquo;premature\u0026rdquo; version of the render of the page. (Haven\u0026rsquo;t seen much of SSR Elm - maybe only the Elm-Pages github project) Ensure there is a Sitemaps site (Elm cannot present a simple XML page - how I created it was generate a JS and embed it into a HTML page; even if I can generate the XML, it would be embedded into a HTML, this is the format that Google Search Indexing is looking for) Need to add meta tags in head tags of HTML - provide meta description and meta tags to handle how the page would look like on facebook/linkedin etc (This is impossible to do in Elm) The 3 reasons above made me question whether to deploy a separate Elm website on a different domain - it would just mean that the page would be totally \u0026ldquo;invisible\u0026rdquo; on the world wide web. Search indexing is still necessary for pages to appear in Google search results. The page becomes less useful if it is not discoverable via Google searches.\nWith that, I\u0026rsquo;m currently experimenting with adding such functionality coded in Elm into this blog (which is generated via Hugo). There are some weird hacks needed since the application kind of goes through netlify build processes - hopefully, it won\u0026rsquo;t be too much issue with continuing the hacks.\nElm Component Demonstration # The demonstrated elm component is below the horizontal line. It only has a simple functionality as the main aim is just to demonstrate that it is possible to embed such code into hugo in the first place. (Although it\u0026rsquo;s done in a pretty hacky way)\nThe application stores a counter and displays it to you, the user. If you click on the \u0026ldquo;+\u0026rdquo; button, the value of the counter rise by 1 and if you click on the \u0026ldquo;-\u0026rdquo; button, the value of the counter drops by 1.\nElm codebase # You can view the code for the above tool in the following Github repo:\nhttps://github.com/hairizuanbinnoorazman/blog/blob/master/tools/src/Sample.elm\nThe most important bit would be the \u0026ldquo;view\u0026rdquo;\nview : Model -\u0026gt; Html Msg view model = div [] [ p [] [ text (\u0026#34;Value: \u0026#34; ++ String.fromInt model.value) ] , button [ style \u0026#34;height\u0026#34; \u0026#34;50px\u0026#34;, style \u0026#34;width\u0026#34; \u0026#34;100px\u0026#34;, onClick Increment ] [ text \u0026#34;+\u0026#34; ] , button [ style \u0026#34;height\u0026#34; \u0026#34;50px\u0026#34;, style \u0026#34;width\u0026#34; \u0026#34;100px\u0026#34;, onClick Decrement ] [ text \u0026#34;-\u0026#34; ] ] It is a pretty simple view which prints: \u0026ldquo;Value: XX\u0026rdquo; where XX is the counter number that the application is handling. It would also have 2 buttons - 1 to increment by 1 and the other is to decrement by 1. The buttons are handled are by onClick events.\nThe onClick events are handled here:\nupdate : Msg -\u0026gt; Model -\u0026gt; ( Model, Cmd Msg ) update msg model = case msg of Increment -\u0026gt; ( { model | value = model.value + 1 }, Cmd.none ) Decrement -\u0026gt; ( { model | value = model.value - 1 }, Cmd.none ) Essentially, the events would alter the centralized store of value; which is this case is the \u0026ldquo;mode\u0026rdquo; object which has the value field.\nOf course, when application starts, we need to initialize the application by setting an initial value. The initialization of the app is handled by the init function\ninit : ( Model, Cmd Msg ) init = ( Model 0, Cmd.none ) We can view this by running elm reactor and viewing it on the browser during development phase.\nThe app here can be \u0026ldquo;compiled\u0026rdquo; into a javascript script which we can then embed it into html. In our case, we are trying to embed it into hugo. To make it easy to embed this into various posts in this blog, we can do so by creating custom shortcodes. You can refer making of shortcodes in hugo here: https://gohugo.io/templates/shortcode-templates/\nThe commands to do this:\nelm make --optimize --output=sample.js ./src/Sample.elm uglifyjs sample.js --compress \u0026#34;pure_funcs=[F2,F3,F4,F5,F6,F7,F8,F9,A2,A3,A4,A5,A6,A7,A8,A9],pure_getters,keep_fargs=false,unsafe_comps,unsafe\u0026#34; | uglifyjs --mangle --output sample.min.js The first command would generate the js that we can be used to embed into html. THe second bash command is more of an optimization step. As we know, javascript is a script - it\u0026rsquo;s not compiled to binary. We would be shipping the whole script over to the client side. It\u0026rsquo;s actually best to ensure that the javascript script is as small as possible and this is done by uglifyjs - it would reduce all the empty spaces as well replace long descriptive functions name to alphabets (essentially making it really really hard to read the code in that format - but extermely light to send it off to client). This process can sometimes can the amount of space used by the script by almost 80% which is a huge amount saving on network bandwidth of sending the js scripts over the wire.\nTo create the shortcode in Hugo using the generated javascript of the elm component:\n\u0026lt;div id=\u0026#34;sample-elm\u0026#34;\u0026gt;\u0026lt;/div\u0026gt; \u0026lt;script src=\u0026#34;{{ \u0026#34;toolsjs/sample.min.js\u0026#34; | relURL}}\u0026#34;\u0026gt;\u0026lt;/script\u0026gt; \u0026lt;script\u0026gt; app = Elm.Sample.init({ node: document.getElementById(\u0026#34;sample-elm\u0026#34;) }); \u0026lt;/script\u0026gt; First step is actually to have the div with the id. I\u0026rsquo;m not exactly sure if there are other html elements that are used by this hugo template - hence, I added -elm in the id to ensure that there is no duplicate element with the same id. We would then load up the javascript script. The loaded javascript script will provide elements for us to initialize and start the app.\nConcluding words # I actually didn\u0026rsquo;t expect this mish mash of technologies to work in the first place. However, it does seem nice that Elm have provided the mechanism (which is probably gonna be permanent) because applications that would want to try Elm but yet wouldn\u0026rsquo;t want to invest 100% into it can try by having Elm take over certain elements in the html page.\nSo far, I don\u0026rsquo;t actually forsee too many issues with this approach - but maybe, with more elm components, there could be potential issues - we shall see.\n","date":"27 November 2021","externalUrl":null,"permalink":"/elm-frontend-in-hugo-static-site/","section":"Posts","summary":"To view the Elm component in action - scroll down to the Elm Component Demonstration section\nMotivation # I wanted to learn to try to write a Frontend Application that provide some sort of dynamic functionality - e.g. doing quick mathematical calculation, fetching data from some backend APIs etc. However, the frontend world is a pretty complex world (and continues to be so to date) - there are many factors to take note when writing it:\n","title":"Elm Frontend in Hugo Static Site","type":"posts"},{"content":"","date":"7 November 2021","externalUrl":null,"permalink":"/categories/r/","section":"Article Categories","summary":"","title":"R","type":"categories"},{"content":"","date":"7 November 2021","externalUrl":null,"permalink":"/tags/r/","section":"Technology Tags","summary":"","title":"R","type":"tags"},{"content":"","date":"7 November 2021","externalUrl":null,"permalink":"/categories/rgoogleslides/","section":"Article Categories","summary":"","title":"Rgoogleslides","type":"categories"},{"content":"","date":"7 November 2021","externalUrl":null,"permalink":"/tags/rgoogleslides/","section":"Technology Tags","summary":"","title":"Rgoogleslides","type":"tags"},{"content":"There was a change to the Google Slides API that resulted in an inability to upload images from Google Drive into Google Slides programmatically. Refer to the following issue on the rgoogleslides github repo - https://github.com/hairizuanbinnoorazman/rgoogleslides/issues/28.\nSeeing how the state of the issue didn\u0026rsquo;t change over a year, personally I don\u0026rsquo;t think there will be any change/fix coming along. I guess the only way to get about this is to have some sort of workaround to try to deal with it.\nSo, before going deeper into this, first we need to understand why people want this \u0026ldquo;images\u0026rdquo; in Google Drive and be \u0026ldquo;injected\u0026rdquo; into Google Slides. The Google Slides that allow injection of the images into Slides accept a URL but previously, accepts a Drive ID of an image. For the URL, it has to be \u0026ldquo;public\u0026rdquo; or at least but Google Servers to fetch the image over into the slides. This won\u0026rsquo;t be ideal if the image is some sort of graph that we generated from our data. We wouldn\u0026rsquo;t want this image to be public at all. Fortunately, the Drive ID was available then which allowed users to add images into user\u0026rsquo;s Drive which can then be programmatically added to Google Slides. The image will never need to be exposed publicly.\nHowever, now that this mechanism is \u0026ldquo;broken\u0026rdquo; - we would need some sort of workaround. We need a way to have the image that we wish to remain private to be available publicly \u0026ldquo;publicly\u0026rdquo; and have some sort of \u0026ldquo;credentials\u0026rdquo; to ensure that no one would accidentally come along and get the image by accident.\nOne way I can think of is to add image into a Nginx server and temporarily serve it with some sort of \u0026ldquo;key\u0026rdquo; as a parameter in the URL. However, the setup for this seems a bit of hassle for the average user of this package. A data analyst generally wouldn\u0026rsquo;t deal too much with servers etc.\nAnother way is to actually utilize S3 based storages (e.g. Google Cloud Storage or AWS S3). Luckily, the APIs for those are pretty clear and its usage is pretty defined. The mechanism that make this ideal is that it is possible to keep images under \u0026ldquo;lock and key\u0026rdquo; most of the time. We can use the \u0026ldquo;signed URLs\u0026rdquo; mechanism which generates a very long URL endpoint which makes it unlikely for someone to guess it. We can set the \u0026ldquo;signed URLs\u0026rdquo; to be expired in 5 mins as well to limit the chance of anyone trying to brute force getting the image. The following post will showcase how to do this.\nThis blog post is using the previous example Sending GGPlot to Google Slides\nSetting up authentications # You would need to get the following:\nClient ID and Client Secret for RGoogleSlides package - Desktop. Refer to the following link: Link - no roles is needed here Service Account with \u0026ldquo;Storage Object Admin\u0026rdquo; Enable Google Drive API Enable Google Slides API Enable Cloud Storage API The Script # We first initialize the libraries that we\u0026rsquo;ll need for this exercise\nlibrary(rgoogleslides) library(googleCloudStorageR) library(ggplot2) library(png) The next step is need for googleCloudStorageR - it apparently needs to look for the following environment variables when trying to create the Signed URL\nSys.setenv(\u0026#34;GCS_DEFAULT_BUCKET\u0026#34; = \u0026#34;XXX-BUCKET-NAME-XXX\u0026#34;) Sys.setenv(\u0026#34;GCS_AUTH_FILE\u0026#34;=\u0026#34;/XXXXXX/service-account.json\u0026#34;) gcs_global_bucket(\u0026#34;XXX-BUCKET-NAME-XXX\u0026#34;) The next step would be create the image of the plot which we would be sending to the slides\n# Do up a quick plot on iris dataset first_plot \u0026lt;- qplot(iris$Sepal.Length, iris$Sepal.Width, color = iris$Species) ggsave(\u0026#34;first_plot.png\u0026#34;, first_plot) # Determine the dimensions of the image image \u0026lt;- png::readPNG(\u0026#34;first_plot.png\u0026#34;) dimension \u0026lt;- dim(image) image_width \u0026lt;- dimension[1]/8 # Calculate to your requirements image_height \u0026lt;- dimension[2]/8 # Calculate to your requirements Image height and width is needed in order to calculate how to position the image on the slides. It will be used at the end.\nNext would be to authorize both R googleslides as well as Cloud Storage R packages\nrgoogleslides::authorize(\u0026#34;XXX-CLIENT-ID-XXX.apps.googleusercontent.com\u0026#34;, \u0026#34;XXX-CLIENT-KEY-XXX\u0026#34;) googleCloudStorageR::gcs_auth(\u0026#34;auth.json\u0026#34;) We would then upload the image to Google Cloud Storage. With the uploaded image - we can use the returned metadata to generated the signed URL\naa = googleCloudStorageR::gcs_upload(\u0026#34;first_plot.png\u0026#34;, predefinedAcl = \u0026#34;bucketLevel\u0026#34;) signedURL = gcs_signed_url(aa) The value of the signedURL can be used in the browser - it should allow you to just view you as it is. The link by default is valid for 1 hour but we can shorten as necessary.\nThe next step would be to create the slides and then to inject the image into it\n# Create a new googleslides presentation slide_id \u0026lt;- rgoogleslides::create_slides(\u0026#34;Test Analysis NEXT\u0026#34;) slide_details \u0026lt;- rgoogleslides::get_slides_properties(slide_id) # Obtain the slide page that the image is to be added to slide_page_id \u0026lt;- slide_details$slides$objectId # Get the position details of the element on the slide page_element \u0026lt;- rgoogleslides::aligned_page_element_property(slide_page_id, image_height = image_height, image_width = image_width) request \u0026lt;- rgoogleslides::add_create_image_request(url = signedURL, page_element_property = page_element) response \u0026lt;- rgoogleslides::commit_to_slides(slide_id, request) Notice the signedURL variable being used as well as the image_height and image_width variables.\nWithin the response variable, it should provide the Slides ID of where the Slides are created\nFull script # The full script is as follows (without the explanation)\nYou would need to substitute in the values as needed by the script\nlibrary(rgoogleslides) library(googleCloudStorageR) library(ggplot2) library(png) Sys.setenv(\u0026#34;GCS_DEFAULT_BUCKET\u0026#34; = \u0026#34;XXX-BUCKET-NAME-XXX\u0026#34;) Sys.setenv(\u0026#34;GCS_AUTH_FILE\u0026#34;=\u0026#34;/XXXXXX/service-account.json\u0026#34;) gcs_global_bucket(\u0026#34;XXX-BUCKET-NAME-XXX\u0026#34;) # Do up a quick plot on iris dataset first_plot \u0026lt;- qplot(iris$Sepal.Length, iris$Sepal.Width, color = iris$Species) ggsave(\u0026#34;first_plot.png\u0026#34;, first_plot) # Determine the dimensions of the image image \u0026lt;- png::readPNG(\u0026#34;first_plot.png\u0026#34;) dimension \u0026lt;- dim(image) image_width \u0026lt;- dimension[1]/8 # Calculate to your requirements image_height \u0026lt;- dimension[2]/8 # Calculate to your requirements rgoogleslides::authorize(\u0026#34;XXX-CLIENT-ID-XXX.apps.googleusercontent.com\u0026#34;, \u0026#34;XXX-CLIENT-KEY-XXX\u0026#34;) googleCloudStorageR::gcs_auth(\u0026#34;auth.json\u0026#34;) aa = googleCloudStorageR::gcs_upload(\u0026#34;first_plot.png\u0026#34;, predefinedAcl = \u0026#34;bucketLevel\u0026#34;) signedURL = gcs_signed_url(aa) # Create a new googleslides presentation slide_id \u0026lt;- rgoogleslides::create_slides(\u0026#34;Test Analysis NEXT\u0026#34;) slide_details \u0026lt;- rgoogleslides::get_slides_properties(slide_id) # Obtain the slide page that the image is to be added to slide_page_id \u0026lt;- slide_details$slides$objectId # Get the position details of the element on the slide page_element \u0026lt;- rgoogleslides::aligned_page_element_property(slide_page_id, image_height = image_height, image_width = image_width) request \u0026lt;- rgoogleslides::add_create_image_request(url = signedURL, page_element_property = page_element) response \u0026lt;- rgoogleslides::commit_to_slides(slide_id, request) Suggestions # A few suggestions while trying to use the following mechanism as a common operating framework to automate sending of plots to Googleslides\nSet a shorter expiry time of the Signed URLs to 5-10 minutes (depending on how long before script can be completed) Set a lifecycle rule on objects to be \u0026ldquo;deleted\u0026rdquo; after 1 day of expiry. This reduces the need to cleanup the images from the bucket. The storage bucket is just a \u0026ldquo;cache\u0026rdquo; for the images, we generally would be regenerating the images for future plots from the code base. ","date":"7 November 2021","externalUrl":null,"permalink":"/sending-ggplot-graphs-to-googleslides-again/","section":"Posts","summary":"There was a change to the Google Slides API that resulted in an inability to upload images from Google Drive into Google Slides programmatically. Refer to the following issue on the rgoogleslides github repo - https://github.com/hairizuanbinnoorazman/rgoogleslides/issues/28.\n","title":"Sending ggplot graphs to googleslides again","type":"posts"},{"content":"NOTE: BEFORE READING THIS - ALL SCREENSHOTS BELOW ARE TAKEN SOMETIME IN OCTOBER 2021. THE UI MAY CHANGE IN THE FUTURE - USE THIS AS A ROUGH GUIDE AND NOT AS ABSOLUTE TRUTH\nThere is a very important function in RGoogleslides - without which, the package might as well wouldn\u0026rsquo;t function:\nauthorize() The following function does the step of getting the user to authenticate to the relevant Google services which in this case is Google Drive and Google Slides. Google Drive API access is to allow the package to create Google Slides on your behalf while the Google Slides API access is to allow us to programmatically manipulate the Google Slides\nThe function comes with its own default secret id and secret key - but of course, this SHOULD NOT BE USED IN PRODUCTION. Let me repeat this point: DO NOT JUST USE THIS AND USE IT IN PRODUCTION - Don\u0026rsquo;t just implicitly trust the secret ids, key access being used here. Anything can happen - e.g. owner\u0026rsquo;s GCP account gets hacked, owner of the project may accidentally delete the secret ids and secret keys, or too many users use the key and it hit some sort of quota etc. Let\u0026rsquo;s just say, it may be fine in the short while - but anything can happen in the long run.\nLuckily, the function optionally accepts a client secret and client key generated by the user\u0026rsquo;s own GCP account\nYou can create the key in a GCP account in the following way. You might first need to check that Google Drive and Google Slides API is on. You can see if the following set of images can help guide you that way.\nYou will need to ensure the Drive and Slides API are turned on\nIn order to create and get the client secret and client key for the package. You can try following the steps\nWe need to pick the \u0026ldquo;Desktop App\u0026rdquo; type\nThat would immediately generate the client secret and client key.\nYou can immediately use similar to the code snippet below:\nauthorize(\u0026#34;XXXXXX.apps.googleusercontent.com\u0026#34;, \u0026#34;XXXXX\u0026#34;) slideID = create_slides(\u0026#34;TEXT\u0026#34;) The following code snippet should work after that - there could be some account differences but it should be resolvable\n","date":"30 October 2021","externalUrl":null,"permalink":"/rgoogleslides-using-your-own-account-client-id-and-secret/","section":"Posts","summary":"NOTE: BEFORE READING THIS - ALL SCREENSHOTS BELOW ARE TAKEN SOMETIME IN OCTOBER 2021. THE UI MAY CHANGE IN THE FUTURE - USE THIS AS A ROUGH GUIDE AND NOT AS ABSOLUTE TRUTH\n","title":"RGoogleslides - using your own account client id and secret","type":"posts"},{"content":"In a previous post, it details some information of how to setup some open source tooling to capture logs, retrieve metrics as well as capture distributed trace information from apps. The previous blog post would cover the setup of logging system which is Loki, distributed tracing system which is Tempo and metrics collection system which is Prometheus. Refer to the link below here.\nSetting up observability tooling in GKE\nIn order to have all this operating information be captured, applications need to be \u0026ldquo;instrumented\u0026rdquo; to expose all of this information. I\u0026rsquo;m mostly familiar with Golang so I will be providing code samples of how a sample app that is instrumented in Golang may look like. Each of these operational information is collected in different ways. For metrics, it is mostly done via pull approach where there would be a probe by Prometheus on a specified endpoint on the application on a specified schedule (usually known to be the \u0026ldquo;agentless\u0026rdquo; approach). For logs, log information is written to local files that is managed by the container tool which would then be fetched and pushed by an agent - in the case of the setup from the previous blog, would be promtail. For distributed traces, we would embed some library (mostly Jaeger which used to be the defacto way of collecting traces) and that library would contain some functionality on how to push the traces over the wire to the distributor trace collector\u0026rsquo;s endpoints.\nHere is a sample codebase of how this can be done:\nhttps://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/fullObservability\nThe codebase will continuously be updated as time goes by - there are news of https://github.com/open-telemetry which could be used as a one stop alternative to manually instrumenting each of these information one at a time. However, at the time of writing this, there still seems to be heavy development work for this, so it might be worth to wait further down the line to see how things go. Also, for companies that are already working in the space, I would expect that many of them have already used older libraries/systems such as prometheus libraries/jaeger libraries rather than opentelemetry libraries and I expect it would take a while before the transition happens.\nApplication Basics # The application is a simple API server that just sleeps and then returns a OK response to a querier. It can be configured such that the API server would call other underlying application\u0026rsquo;s API server as well. As of now configuration is done via environment variables. This would provide some kind of simple control of how applications can be set to with some sort of application dependencies between applications. Future iterations could be altered such that the configurations can be set via yaml/json configs passed in via Kubernetes Configmaps.\nThere are also other \u0026ldquo;administrative\u0026rdquo; portions of application building that would also need to be created which would be the healtcheck and readiness probes. These probes would indicate the health of the application (whether its stuck in processing and is unable to process any item further etc). These endpoints need to be created and made available to accessed when deployed in GKE. The endpoints that are created for this purpose are /healthz and /readyz endpoints.\nMetrics - Monitoring # There isn\u0026rsquo;t much to monitor in such a simple application; the only easy one would be no of requests that the application receives - not counting the /healthz and /readyz endpoints.\nvar ( requestsTotal = promauto.NewCounter(prometheus.CounterOpts{ Name: \u0026#34;requests_total\u0026#34;, Help: \u0026#34;The total number of processed events\u0026#34;, }) ) Logging # In order to make it \u0026ldquo;easier\u0026rdquo; for analysis - logs are configured to be written in JSON format using the logrus golang library. There is a JSON formatter that comes along with it; so we\u0026rsquo;ll just need to \u0026ldquo;switch it on\u0026rdquo;\nDistributed traces # In order to make it easier to manage distributed traces, it would be best to just follow what the client library has in terms of controlling the collection of such traces. Refer to the Jaeger library that was used for this app here: https://github.com/jaegertracing/jaeger-client-go. The README of the repo contains the environment variables that can be read and used to manipulate the collection of the trace data. Examples of how else to initialize the Jaeger collector can be found here: https://github.com/jaegertracing/jaeger-client-go/blob/master/config/example_test.go\ncfg, err := jaegercfg.FromEnv() Future iterations of this sample application may be modified to ensure that metrics when collecting traces is also collected. We would want to monitor that traces are successfully sent to collector and not failing due to amount of traces being generated.\nThe environment variables that are to be used to control how the distributed trace data can be collected would eventually be definted within the Kubernetes manifest files. This would be covered in the Deployment section of this blog post.\nDeployment # Application will of course be deployed into a Kubernetes Cluster. We would need to have a Dockerfile which would be used to build the docker images or OCI images. The built images will then be pushed into Google Container Registry which would serve as a container store for GKE.\nTo deploy and run the containers in the cluster, we would rely on simple Kubernetes manifest yaml files. To templatize parts of it, we would utilize Kustomize. With Kustomize, we can then specify which images to be used and that would be injected into the manifest files before applying it into the cluster.\nA large part of the Kubernetes manifest files would be defining the environment application that would be control the behaviour of the application as well as distributed trace collection etc.\nenv: - name: WAIT_TIME value: \u0026#34;1\u0026#34; - name: TARGET value: \u0026#34;MIAO\u0026#34; - name: SERVICE_NAME value: app2 - name: CLIENT_URL value: \u0026#34;http://app3:8080\u0026#34; - name: JAEGER_AGENT_HOST value: tempo-tempo-distributed-distributor - name: JAEGER_REPORTER_LOG_SPANS value: \u0026#34;true\u0026#34; - name: JAEGER_SAMPLER_TYPE value: const - name: JAEGER_SAMPLER_PARAM value: \u0026#34;1\u0026#34; Environment variables that prefix with JAEGER are related to the distributed traces collection. WAIT TIME refers to how long before the API responds. SERVICE_NAME is as it refers - which is to flag to the various services the name of the service - name is passed to distributed trace. CLIENT_URL refers to potential endpoint that this service should call before returning to the caller. For actual logic, please refer to the code base.\nWithin the makefile, there are a couple of convenience functions that can be used to serve as convenience command. They can be used to build the images, push the images to the container registry as well as deploy said images into the GKE cluster. The makefile has already been configured to accept params such as VERSION to specify version of image to be built and push.\nmake build VERSION=0.0.5 make push VERSION=0.0.5 make deploy VERSION=0.0.5 ","date":"29 September 2021","externalUrl":null,"permalink":"/app-with-metrics-logs-and-distributed-traces/","section":"Posts","summary":"In a previous post, it details some information of how to setup some open source tooling to capture logs, retrieve metrics as well as capture distributed trace information from apps. The previous blog post would cover the setup of logging system which is Loki, distributed tracing system which is Tempo and metrics collection system which is Prometheus. Refer to the link below here.\n","title":"App with Metrics, Logs and Distributed Traces","type":"posts"},{"content":"Generally, most cloud providers come along with all the observability tooling that you need for your apps built-in with the platform. Some of the common observability tools such as logging, monitoring and nowadays, distributed tracing are usually made available and you can easily use said tools by reading up on the various documentation of how to setup each of these tooling. E.g. if your application is inside a virtual machine and if you need collect metrics and logs from the application, you may need to install an agent in the said VM. The agent would collect those information and send it to the centralized observability tooling in the cloud provider where the information would be provided to you via a UI. Most of the time, these tools are charged based on the amount of logs/metrics you generate from the application (so the less logs/metrics you generate, the cheaper it is monitor your application - a very understanable/reasonable situation). In cases where if your application runs in Kubernetes, maybe the cluster comes with agents pre-installed, making it easier to make use of the logging/metrics/distributed tracing that the cloud provider has.\nHowever, let\u0026rsquo;s say if you were in a \u0026ldquo;baremetal\u0026rdquo; kind of deployment; essentially your applications are not being deployed onto a cloud provider. How would one handle it? What are some of the alternatives/common ways to get such capabilities to be part of your deployment setup?\nIn this post, I would aim to cover deployment of a logging, metrics as well as distributed tracing in a Kubernetes cluster. Probably, in another blog post, I would then cover how a user would be able to view logs/metrics/traces on such a deployment. (Note: The application needs to be \u0026ldquo;instrumented\u0026rdquo; to be able to collect such information)\nTo follow along the setup here, you can view the following folder in this repo:\nhttps://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Environment/kubernetes\nThere are various ways to install such components on Kubernetes. The direct way is to set create the Kubernetes manifests and directly apply it on the cluster via kubectl apply -f \u0026lt;file names\u0026gt;. However, it can be kind of a hassle to do so for many manifest files where there are many repeated variables (e.g. namespace, labels etc). That is where other templating projects come up to solve to make it easy to \u0026ldquo;template\u0026rdquo; out said Kubernetes manifest files. One is Kustomize. Another templating tool to deal with installing stuff is Helm. For this post, Helm will be the most that is mostly used here.\nFor Helm, it is assumed that Helm 3 is being used here. Helm 2 is pretty much outdated and most of the charts out there now mostly have instructions on how to install their respective charts with instructions assuming you have Helm 3.\nThere may be further updates in the future for the installation of this observabiltiy tools - the updates will go into the codebase mentioned above.\nInstall Metrics Component # For metrics, the general common metric server is usually Prometheus. Prometheus collects the metrics data from application by having agents reach out and collect it from specific paths/ports on the application side (this can be customized via Kubernetes annotations etc)\nPrometheus comes with its own UI but generally, the Prometheus UI is more \u0026ldquo;exploratory\u0026rdquo; work where one explores and tries to understand relationships of how a metric act across time. Generally, people don\u0026rsquo;t keep exploring metrics just to understand how their application; they would rather have \u0026ldquo;dashboards\u0026rdquo; - pre-built charts that showcase the most important information of how their application is performing at that point of time. Prometheus UI is not capable of doing this; rather, it falls to another service, namely Grafana, which provides the graphing/dashboard on top of all these metrics information.\nHere are some of Prometheus Helm Charts that can be installed on a cluster:\nhttps://github.com/prometheus-community/helm-charts/tree/main/charts\nAfter installing Helm, you would first need to prep your local environment to be able know of this repository of charts.\nhelm repo add prometheus-community https://prometheus-community.github.io/helm-charts You can then just install the helm chart after doing that. Rather than the helm install subcommand, it would be better to use upgrade --install subcommand with the install flag. The same command can be used to update the installtion on the cluster without switching back and forth between the install and upgrade step. This would make it easier to be put into the script; you can run the script with little worrying that the script only works on \u0026ldquo;first time\u0026rdquo; installs etc. It makes it easier to make the script \u0026ldquo;idempotent\u0026rdquo;\nhelm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack In the case of the repo that was provided, we need to make certain modifications to fit my usecase. In my usecase, I\u0026rsquo;m deploying this onto a GKE which uses kubeDNS rather than CoreDNS (CoreDNS seems to be the default in some of the Kubernetes setup if deployed via Ansible kubespray etc). Hence, with that, it would be best to pass in a file to customize the installation to have prometheus monitor KubeDNS rather than CoreDNS (which is the default).\nhelm upgrade --install -f prom.yaml kube-prometheus-stack prometheus-community/kube-prometheus-stack The full list of options is available in Charts folder of the respective Chart:\nhttps://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml\nTo view Prometheus UI, we can run the following command:\nkubectl port-forward service/kube-prometheus-stack-prometheus 9090 One of the important pages on Prometheus UI is the \u0026ldquo;targets\u0026rdquo; page. As much as possible, we would to reduce the number of unhealthy targets\nThe prometheus stack also deploys a Grafana dashboard as well. We can access it by port-forwarding one of localhost to port 3000 or sth for the Grafana service.\nkubectl port-forward service/kube-prometheus-stack-grafana 3000:80 The admin username and password can be found in the following secret: kube-prometheus-stack-grafana. It is actually set in the helm chart - so if you wanted to set a different root/admin password. We can then run the following command to view what is being used to set the username and password.\nkubectl get secrets kube-prometheus-stack-grafana -oyaml The output for the above secret (not the full secret yaml definition) would show this:\napiVersion: v1 data: admin-password: cHJvbS1vcGVyYXRvcg== admin-user: YWRtaW4= ldap-toml: \u0026#34;\u0026#34; kind: Secret metadata: annotations: meta.helm.sh/release-name: kube-prometheus-stack meta.helm.sh/release-namespace: default ... name: kube-prometheus-stack-grafana namespace: default resourceVersion: \u0026#34;3410\u0026#34; uid: ff19912e-43fd-47c6-a97a-aec8fb687a87 type: Opaque The username and password looks like gibberish but its just base64 encoded. We can reveal actual values by running the following commands:\necho -n cHJvbS1vcGVyYXRvcg== | base64 --decode echo -n YWRtaW4= | base64 --decode Install Objects Storage # In the case of \u0026ldquo;bare metal\u0026rdquo; services that have data that shouldn\u0026rsquo;t be stored in Cloud Providers, that would mean that we wouldn\u0026rsquo;t have a \u0026ldquo;S3\u0026rdquo; that we can use. S3 provides object storage and is relatively known to be service that provides cheap and durable storage (albeit at the exchange of not being in a filesystem and without all the various filesystem guarantees). In Google Cloud, GCS would be the object storage that would be provided to its customers.\nHowever, what if we don\u0026rsquo;t have this capability; how should do it?\nWell, there is Minio. It provides object storage and is S3 compatible - which kind of means that one can utilize libraries that talks to S3 and switch the endpoints to the Minio deployment and it should still work as expected.\nhelm upgrade --install minio-operator minio/minio-operator There were some weird issues with trying to set up the certs for SSL and hence, I set up minio and turned off requirement for SSL. Also, generally, you can skip SSL for internal traffic but once I figured out the reason why the SSL didn\u0026rsquo;t work as expected, I would probably adjust the installation of the Minio Object Storage component to have SSL communications.\nhelm upgrade --install -f minio.yaml minio-operator minio/minio-operator The default values yaml file for minio operator:\nhttps://github.com/minio/operator/blob/master/helm/minio-operator/values.yaml\nBy default, Minio doesn\u0026rsquo;t come with any initial default buckets. Also, at the same time, it does seem that the Minio Operator Helm chart doesn\u0026rsquo;t contain any capability to be able provision bucket on initial deploy. Hence, we would need to script this out on our end to create the buckets accordingly.\nA simple way would be to utilize the minio/mc image. It contains the mc CLI tool that would interact with the Minio Cluster. The slightly tricky part is to have this image connect to the Minio cluster and this is defined in the following Configmap and Job. The configmap would provide a config.json file that is actually used by the mc cli tool. We would replace the default config.json file - which would then allow us to create the buckets.\napiVersion: v1 kind: ConfigMap metadata: name: mc-config data: config.json: | { \u0026#34;version\u0026#34;: \u0026#34;10\u0026#34;, \u0026#34;aliases\u0026#34;: { \u0026#34;yahoo\u0026#34;: { \u0026#34;url\u0026#34;: \u0026#34;http://minio1-hl.default.svc.cluster.local:9000\u0026#34;, \u0026#34;accessKey\u0026#34;: \u0026#34;minio\u0026#34;, \u0026#34;secretKey\u0026#34;: \u0026#34;minio123\u0026#34;, \u0026#34;api\u0026#34;: \u0026#34;s3v4\u0026#34;, \u0026#34;path\u0026#34;: \u0026#34;auto\u0026#34; } } } --- apiVersion: batch/v1 kind: Job metadata: name: make-bucket-testtest spec: template: spec: containers: - name: mc image: minio/mc args: [\u0026#39;mb\u0026#39;, \u0026#39;--ignore-existing\u0026#39;, \u0026#39;yahoo/testtest\u0026#39;] # For debugging # args: [\u0026#39;admin\u0026#39;, \u0026#39;info\u0026#39;, \u0026#39;yahoo\u0026#39;] volumeMounts: - mountPath: /root/.mc/config.json name: mc-config subPath: config.json restartPolicy: Never volumes: - configMap: defaultMode: 0777 name: mc-config name: mc-config --- apiVersion: batch/v1 kind: Job metadata: name: make-bucket-haha spec: template: spec: containers: - name: mc image: minio/mc args: [\u0026#39;mb\u0026#39;, \u0026#39;--ignore-existing\u0026#39;, \u0026#39;yahoo/haha\u0026#39;] # For debugging # args: [\u0026#39;admin\u0026#39;, \u0026#39;info\u0026#39;, \u0026#39;yahoo\u0026#39;] volumeMounts: - mountPath: /root/.mc/config.json name: mc-config subPath: config.json restartPolicy: Never volumes: - configMap: defaultMode: 0777 name: mc-config name: mc-config We can view the Minio UI via the following command:\nkubectl port-forward service/minio1-console 9090 The username and password for this can be found here:\nkubectl get secrets minio1-secret -oyaml Install Logging Stack # There is a new player in town when it comes to the logging game. It\u0026rsquo;s Loki and I believe it is also provided by the same team that provided Grafana and other observability tooling.\nIn the past, ELK is kind of the hot player when it comes to handling Logging. ELK stands for the Elasticsearch, Logstash and Kibana application stack and the three of them deal with logs pretty decently. Elasticsearch is able to store large copious amounts of data. Logstash serves to the \u0026ldquo;computation\u0026rdquo; layer which does filters and other calculations before passing the log over to Elasticsearch. Kibana is the \u0026ldquo;presentation\u0026rdquo; layer that would provide the graphing and dashboarding capability to present the logs being captured in Elasticsearch.\nThe interesting thing about logging, at least in the ELK case, is that you need a large amount of resources to store all that information. Also, considering that we are storing the logs in Elasticsearch; the logs are also being indexed. It makes the logs searchable but it comes at a huge cost - indexes would need to be built for them. The indexes for such large amount of log data is not cheap - if you have GBs of logs data, you can also expect GBs of indexes of the log data. The data is kind of all stored on disk - which makes it pretty expensive to handle.\nWith Loki, they presented a interesting way to solve this whole logging problem. They have storage code to store the logs into object storage (which is definitely a cheaper way to store the raw log data). The indexes are now storable in its own boltdb database (I believe its customized for its use) or in cassandra database (for a more production-like setup)\nI\u0026rsquo;m not sure how it\u0026rsquo;ll go but it\u0026rsquo;ll be interesting to observe whether this solution would play effectively in the future. This might not be the \u0026ldquo;final answer\u0026rdquo; for the whole logging situation that companies who can\u0026rsquo;t rely on Cloud Providers for their observability needs.\nTo install Loki:\nhelm repo add grafana https://grafana.github.io/helm-charts Once we run the code above, we can then install the helm chart. Either run the next line for the default installation.\nhelm upgrade --install loki grafana/loki-distributed Or, if we want to customize the installation slightly:\nhelm upgrade --install -f loki.yaml loki grafana/loki-distributed The default values file for loki-distributed chart:\nhttps://github.com/grafana/helm-charts/blob/main/charts/loki-distributed/values.yaml\nThe above would install only Loki - it doesn\u0026rsquo;t collect logs from the containers in the Kubernetes cluster. The logs can be collected by log collection agents; some examples are filebeat, fluentd and promtail. Since we\u0026rsquo;re already using Loki, we might as well use a project that is kind of created by the Grafana team - promtail\nTo install promtail:\nhelm upgrade --install -f promtail.yaml promtail grafana/promtail The default values yaml file for promtail is here:\nhttps://github.com/grafana/helm-charts/blob/main/charts/promtail/values.yaml\nInstall Distributed Tracing Component # Distributed tracing is one of the newer parts in the observability space. This is partly because the tool is created to solve analyzing the newer software architectures that came up - microservices. With microservices, it makes it extremely hard to understand how request spend bouncing around the various applications in the data centre. The code is instrumented with snippets of code to report incoming/outgoing requests between the microservices to a centralized distributed tracing tool which would then provide vizualizations for us to understand what\u0026rsquo;s happening under the hood of the apps.\nThe poster boy for the distributed tracing projects are Zipkin and Jaeger and has been for a couple of years. However, as with the Loki project in its attempts to solve logging by reducing storage requirements, the same can be said for Zipkin and Jaeger. Jaeger kind of depends on databases for storage (namely Cassandra/Elasticsearch). And with those Databases, it is way more resource heavy to manage those - it those more memory and it takes way more storage. Storing distributed traces in S3 would definitely be a nice cheaper alternative if its possible and with that - I can finally mention about Tempo.\nThe main draw of Tempo here is the capability to store the data on some sort of S3 compatible storage. This allow us to store distributed traces much more cheaply - which also then makes it easier to know go through the choice of whether to do sampling on distributed traces or not. If Storage prices of attempting to store traces goes down enough, then it would be worth to totally keep all traces - which would make that pretty awesome.\nScripting it out # In the folder, https://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Environment/kubernetes, it contains a Makefile that would help create the cluster and install the observability tooling at one go.\nmake environment make cluster make observability The make environment is done to set up Helm repositories. The make cluster is done to create Kubernetes cluster. It is assumed that the gcloud tool is setup properly. The make observability will then setup the tooling such as Prometheus, Minio, Loki, Promtail, Tempo - and all of these metrics/logs/traces are shown on Grafana.\n","date":"1 September 2021","externalUrl":null,"permalink":"/setting-up-observability-tooling-in-gke/","section":"Posts","summary":"Generally, most cloud providers come along with all the observability tooling that you need for your apps built-in with the platform. Some of the common observability tools such as logging, monitoring and nowadays, distributed tracing are usually made available and you can easily use said tools by reading up on the various documentation of how to setup each of these tooling. E.g. if your application is inside a virtual machine and if you need collect metrics and logs from the application, you may need to install an agent in the said VM. The agent would collect those information and send it to the centralized observability tooling in the cloud provider where the information would be provided to you via a UI. Most of the time, these tools are charged based on the amount of logs/metrics you generate from the application (so the less logs/metrics you generate, the cheaper it is monitor your application - a very understanable/reasonable situation). In cases where if your application runs in Kubernetes, maybe the cluster comes with agents pre-installed, making it easier to make use of the logging/metrics/distributed tracing that the cloud provider has.\n","title":"Setting up Observability Tooling in GKE","type":"posts"},{"content":"As of now, one of the common and easier way to have services communicate with each other would be over HTTP. In real world use cases, HTTPS is usually used (in order to ensure communications are secure) and this communication is done following some sort of REST framework. This provides some sort of structure of how to standardize such communications for the various software applications out there. It got to the point where entire companies are developing in order to support this: e.g. Apigee, SmartBear\nHowever, with HTTP based communications, there is some sort of overhead in order to do the communication. A recent version update of HTTP to HTTP/2 allows the services to setup the communication with less overhead by creating long lived connections and multiplexing communications over each other as well as sending data in an encoded format. Refer to this possible video for a point of reference on this: https://www.youtube.com/watch?v=RoXT_Rkg8LA. However, we\u0026rsquo;re not going to go deep into this as that is not the primary focus of this article. This article focuses on trying to setting up of golang grpc services and load balance such traffic with envoy on Kubernetes.\nRefer to the following link while following this article:\nhttps://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicGRPC\nThere would be updates on it as time goes on - updates to the codebase will be listed on the README.md file\nProtobuff file generation # The following file is to be saved in \u0026ldquo;ticketing\u0026rdquo; module. The proto file would be used to generate the golang files that is to be used to intepret the messages that are being passed back and forth between the client and server services.\nsyntax = \u0026#34;proto3\u0026#34;; option go_package = \u0026#34;github.com/hairizuanbinnoorazman/basic-grpc/ticketing\u0026#34;; package ticketing; service CustomerController { rpc GetCustomer(GetCustomerRequest) returns (Customer) {} rpc CreateCustomer(CreateCustomerRequest) returns (Customer) {} rpc ListCustomers(ListCustomersRequest) returns (CustomerList) {} } message GetCustomerRequest { string id = 1; } message CreateCustomerRequest { string first_name = 1; string last_name = 2; } message ListCustomersRequest {} message CustomerList { repeated Customer customers = 1; } message Customer { string id = 1; string first_name = 2; string last_name = 3; } We need to then produce the required golang files to handle the messages in golang\nRun the following command:\nprotoc --go_out=. --go_opt=paths=source_relative \\ --go-grpc_out=. --go-grpc_opt=paths=source_relative \\ ticketing/ticketing.proto That would produce 2 files: ticketing.pb.go and ticketing_grpc.pb.go. These said files would then be used as part of the golang services.\nGolang GRPC Server # This would be golang GRPC Server that utilize the ticketing proto golang files\npackage main import ( \u0026#34;context\u0026#34; \u0026#34;fmt\u0026#34; \u0026#34;log\u0026#34; \u0026#34;net\u0026#34; \u0026#34;os\u0026#34; \u0026#34;github.com/hairizuanbinnoorazman/basic-grpc/ticketing\u0026#34; \u0026#34;google.golang.org/grpc\u0026#34; ) var podID string = \u0026#34;test\u0026#34; type actualCustomerControllerServer struct { ticketing.UnimplementedCustomerControllerServer } func (a actualCustomerControllerServer) GetCustomer(context.Context, *ticketing.GetCustomerRequest) (*ticketing.Customer, error) { log.Println(\u0026#34;Hit Get Customer rpc call\u0026#34;) defer log.Println(\u0026#34;End Get Customer rpc call\u0026#34;) return \u0026amp;ticketing.Customer{ Id: podID, FirstName: \u0026#34;acac\u0026#34;, LastName: \u0026#34;accqqq\u0026#34;, }, nil } func main() { fmt.Println(\u0026#34;Server Start\u0026#34;) var exists bool podID, exists = os.LookupEnv(\u0026#34;POD_NAME\u0026#34;) if !exists { fmt.Println(\u0026#34;Value of podID is test\u0026#34;) } lis, _ := net.Listen(\u0026#34;tcp\u0026#34;, fmt.Sprintf(\u0026#34;0.0.0.0:12345\u0026#34;)) var opts []grpc.ServerOption grpcServer := grpc.NewServer(opts...) ticketing.RegisterCustomerControllerServer(grpcServer, actualCustomerControllerServer{}) grpcServer.Serve(lis) } The POD_NAME is used to provide context on where the reply is coming from. This Golang application is being assumed to be running on Kubernetes environment. An environment variable is needed to be fed to the application to be able to uniquely identify the differents pods which would reply to the client portion of the application.\nGolang GRPC Client Application # This would be golang GRPC Client that utilize the ticketing proto golang files\npackage main import ( \u0026#34;context\u0026#34; \u0026#34;fmt\u0026#34; \u0026#34;log\u0026#34; \u0026#34;os\u0026#34; \u0026#34;time\u0026#34; \u0026#34;github.com/hairizuanbinnoorazman/basic-grpc/ticketing\u0026#34; \u0026#34;google.golang.org/grpc\u0026#34; ) func main() { domain, exists := os.LookupEnv(\u0026#34;SERVER_DOMAIN\u0026#34;) if !exists { domain = \u0026#34;localhost\u0026#34; } port, exists := os.LookupEnv(\u0026#34;SERVER_PORT\u0026#34;) if !exists { port = \u0026#34;12345\u0026#34; } var opts []grpc.DialOption opts = append(opts, grpc.WithTimeout(3*time.Second), grpc.WithInsecure()) conn, err := grpc.Dial(fmt.Sprintf(\u0026#34;%v:%v\u0026#34;, domain, port), opts...) if err != nil { fmt.Println(err) panic(err) } defer conn.Close() for { getCustomerDetails(conn) time.Sleep(3 * time.Second) } } func getCustomerDetails(conn *grpc.ClientConn) { client := ticketing.NewCustomerControllerClient(conn) log.Println(\u0026#34;Start GetCustomerDetails\u0026#34;) defer log.Println(\u0026#34;End GetCustomerDetails\u0026#34;) zz, err := client.GetCustomer(context.Background(), \u0026amp;ticketing.GetCustomerRequest{}) if err != nil { fmt.Println(err) } log.Println(zz) } It would be best to construct the code such that it would accept the SERVER_DOMAIN and SERVER_PORT as environment variables. These variables would vary based on deployment - and in the case of the application, it would be specific for a Kubernetes environment.\nNote on the part that the establishing of the communication between client and server does not immediately end after the messages get send from server to client. Messages can still be sent back and forth with terminating the connection. This establishing of connection is the overhead being referred to at the top of the article which is the resources being saved for not required the applications to re-establish it over and over again.\nBuild the docker image # Since we\u0026rsquo;re deploying to a Kubernetes environment, we would need docker images for this. We can do this by having the following dockerfile.\nFROM golang:1.16 as base WORKDIR /app COPY go.mod go.sum ./ RUN go mod download FROM base as client COPY . . RUN go build -o app ./client CMD [\u0026#34;/app/app\u0026#34;] FROM base as server COPY . . RUN go build -o app ./server CMD [\u0026#34;/app/app\u0026#34;] EXPOSE 12345 Some interesting points of the Docker image to build the golang images is to add the go.mod and go.sum files to the base image first. These 2 dependency files are used to be able to pull the required golang modules. With this, that would allow the docker images to form a single layer image that can be cached as long as the golang module dependency files is not changed.\nAn example of how a docker image that would be from this Dockerfile would be: docker build --target client -t gcr.io/XXXXXXXX/grpc-client:XXXXXXXXXXX .\nNote on how the image is being tagged here - identified by the -t flag. In our example here, we\u0026rsquo;re trying to deploy all of these into GKE, which is best used alongside GCP Container Registry. We do this by using the gcr.io domain which would be where the docker images would be sent to.\nDeploy it into Kubernetes # We would first need to deploy 1 client and at least 2 server instances of it into kubernetes. In the case of the yaml definitions below, it would be best to alter the images of the grpc client and grpc server. We can utilize kustomize to do so. See the github link from above for a reference to this.\napiVersion: apps/v1 kind: Deployment metadata: name: app-client labels: app: client spec: replicas: 1 selector: matchLabels: app: client template: metadata: labels: app: client spec: containers: - name: client image: grpc-client:latest command: [\u0026#34;/app/app\u0026#34;] env: - name: SERVER_DOMAIN value: envoy - name: SERVER_PORT value: \u0026#34;8443\u0026#34; --- apiVersion: apps/v1 kind: Deployment metadata: name: app-server labels: app: server spec: replicas: 2 selector: matchLabels: app: server template: metadata: labels: app: server spec: containers: - name: server image: grpc-server:latest command: [\u0026#34;/app/app\u0026#34;] env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: APP_VERSION value: V1 ports: - containerPort: 12345 --- apiVersion: v1 kind: Service metadata: name: app-server-headless spec: type: ClusterIP clusterIP: None selector: app: server ports: - protocol: TCP port: 12345 targetPort: 12345 An important thing to note is that we would need to create a \u0026ldquo;headless\u0026rdquo; service rather than a normal service. A normal service (essentially not setting the clusterIP: None) would only release 1 IP address which is insufficient information to be passed to envoy. Headless services would provide the full list of ip address.\nEssentially, a headless service would mean we are not relying on Kubernetes to do load balancing of web streams from applications.\nThe yaml definition is for deploying the envoy that would be used to load balance grpc traffic\napiVersion: apps/v1 kind: Deployment metadata: name: envoy spec: replicas: 1 selector: matchLabels: app: envoy template: metadata: labels: app: envoy spec: containers: - name: envoy image: envoyproxy/envoy:v1.18.3 ports: - name: https containerPort: 8443 volumeMounts: - name: config mountPath: /etc/envoy volumes: - name: config configMap: name: envoy-conf --- apiVersion: v1 kind: Service metadata: name: envoy spec: selector: app: envoy ports: - name: https protocol: TCP port: 8443 targetPort: 8443 --- apiVersion: v1 kind: ConfigMap metadata: name: envoy-conf data: envoy.yaml: | static_resources: listeners: - name: listener_0 address: socket_address: address: 0.0.0.0 port_value: 8443 filter_chains: - filters: - name: envoy.filters.network.http_connection_manager typed_config: \u0026#34;@type\u0026#34;: type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager access_log: - name: envoy.access_loggers.stdout typed_config: \u0026#34;@type\u0026#34;: type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog codec_type: AUTO stat_prefix: ingress_https route_config: name: local_route virtual_hosts: - name: https domains: - \u0026#34;*\u0026#34; routes: - match: prefix: \u0026#34;/\u0026#34; route: cluster: echo-grpc max_grpc_timeout: 2s http_filters: - name: envoy.filters.http.router typed_config: {} clusters: - name: echo-grpc connect_timeout: 0.5s type: STRICT_DNS dns_lookup_family: V4_ONLY lb_policy: ROUND_ROBIN http2_protocol_options: {} load_assignment: cluster_name: echo-grpc endpoints: - lb_endpoints: - endpoint: address: socket_address: address: app-server-headless.default.svc.cluster.local port_value: 12345 admin: access_log_path: /dev/stdout address: socket_address: address: 127.0.0.1 port_value: 8090 Once we deployed all of the above Kubernetes resources, we should be able see the logs of the client application and see the traffic would be load balanced across the server application.\nIt is important to take note that during GRPC communication - it is not the same as normal http communication. The communication is established, and is maintained. The messages get passed back and forth between client and server. It is expected that resource requirements to handle this form of communication should be way lower as compared as to normal http rest-based traffic.\nResources # List of useful URLs\nhttps://github.com/envoyproxy/envoy/tree/main/examples https://github.com/GoogleCloudPlatform/grpc-gke-nlb-tutorial https://github.com/GoogleCloudPlatform/grpc-gke-nlb-tutorial/blob/master/envoy/k8s/envoy-configmap.yaml https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_conn_man/traffic_splitting ","date":"15 July 2021","externalUrl":null,"permalink":"/using-envoy-for-grpc-applications-in-kubernetes/","section":"Posts","summary":"As of now, one of the common and easier way to have services communicate with each other would be over HTTP. In real world use cases, HTTPS is usually used (in order to ensure communications are secure) and this communication is done following some sort of REST framework. This provides some sort of structure of how to standardize such communications for the various software applications out there. It got to the point where entire companies are developing in order to support this: e.g. Apigee, SmartBear\n","title":"Using Envoy for GRPC Applications in Kubernetes","type":"posts"},{"content":"This is definitely not an exhaustive list of items to consider but definitely some of the more obvious features that client side users would look out for and consider when attempting to install such third party apps and operate it on their infrastructure.\nThis list was compiled while I was attempting to deploy third party applications during my course of work/side projects.\nFrom this point onwards, we would need to assume that when we are building applications here - we would be referring to the point that such applications are targeted to be deployed on client side (someone else would take that application and deploy/manage it on their architecture).\nData Management # When building applications, we may need to store state in some cases. Data needs to be stored in some way such as in a database or object storage or in simple files. In this case, we shouldn\u0026rsquo;t exactly assume that we would know what kind of databases that users of the application would usually use. Some users face restrictions in what kind of databases they could deploy into production. Others might predict that they would use the app heavily and they would be confident that their database of choice would be able to handle it as compared to what we may choose.\nIn order to allow the usage of multiple types of database, we may need to consider either building that support of multiple databases from the beginning or allow users to build plugins that can be used alongside the application to store the data in the user\u0026rsquo;s own preferred database.\nBeyond the choosing of database, we may need to then look into how much data would be stored by the app and how long should this data be stored and how to tune the application such that it would produce less load on database systems. These should be exposed as various tuning options that would allow user to have full control of how the application can impact the user\u0026rsquo;s production environment.\nTo sum it all, here are some questions to ponder when considering the data management aspects when building third party applications\nFlexibility to support different types of databases/storage. Either have that support built in or provide capability to build plugins which can be run alongside main binaries to support such functionality Data retention capability (How long to keep the data). Any mechanism to remove old data? Data packing capability. In the case where we need to store large amounts of data, would we be able to alter it such that we can zip and pack the data to reduce usage of storage resources Deployment # Ideally, if we\u0026rsquo;re the users of these apps, we wouldn\u0026rsquo;t want to build up the deployment artifacts on our own while trying to deploy them. There are too many aspects that they would need to consider when building such artifacts (e.g. dependencies required, security requirements, configuration/initialization files to be created) and if users are expected to think of these concerns, that would be make it harder for them to adopt into their companies. (Unless there is a strong compelling reason to use them)\nWith that, if we are to be the builders of these apps that are aimed to be deployed at client infrastructures, it would be ideal for usto be in charge of building such artifacts. The unfortunate thing is, we won\u0026rsquo;t know what is the target platforms users of these application would be using. They could be using plain old bare metal servers or virtual machines. They could go slightly fancy and be using Kubernetes or other container based deployment platforms.\nHere are some examples of types of artifacts one may think to provide:\nDocker images Helm chart Kustomize scripts RPMs (Centos Package System) DEBs (Ubuntu and Debian Package System) Linux executables Window executables Ansible scripts (maybe?) We would also need to consider that the binaries could be deployed on various computer architectures such as 32 bit systems, 64 bit systems or arm systems.\nFor these artifacts, we would need to consider where those artifacts would be made available on. E.g. for docker images, should be put into Dockerhub? How frequent will the updates be for the docker images? What sort of version scheme would you be following? (follow app versioning? - but what if there is a need to change docker image base layers but not the application itself? How will it be handled?)\nIn some of the cases such as the helm chart and kustomize scripts - we may need to ensure that the configuration is setup in a flexible manner to ensure that users of these artifacts would be to customize the installation according to their requirements and architecture.\nOn top of that, if the application takes up a configuration file in order to alter and adjust the behaviour of the application, a significant amount of time and labour would be needed to document all of such behaviours down in order to inform users how the application can be modified to suite their user\u0026rsquo;s needs.\nPermission systems # Some aspects of it would relate to the authentication and authorization when it comes to the usage of the application. This is to ensure that only authorized users would be able to access the data in the application; sometimes, we don\u0026rsquo;t know such data to be \u0026ldquo;public\u0026rdquo; even if application is hidden in the user\u0026rsquo;s company\u0026rsquo;s VPN.\nSome things to consider:\nCapability to set how much \u0026ldquo;powers\u0026rdquo; each user have in term of operating on data/resources in the application being built. E.g. maybe having admin role (which can do full create, read, write capabilities), reader role etc Capability to set users on the applications in groups. This brings up a question whether you can nest groups within groups and whether you mix users and groups under a group. Capability to integrate application with existing user systems (e.g. ldap in a company, google groups in the case where a company uses Google Workspace to manage their users) Operationability # With all applications, we would need to ensure that the application being built can be operated safely in production environments. Some of the things that we need to be concerned about could with regards about monitoring and logging and security of the binaries\nSome of the aspects that we need to consider:\nProviding monitoring methodology. Currently, prometheus is one of the popular tools that is being monitor applications and tools. This can be done by providing a prometheus endpoints Flexibility to define where the application would send the logs to (e.g. should logs be sent to a file? Or should logs be sent stdout?) Level of logs that can be pulled from the application. In many companies, they would utilize some sort of centralized log collecting mechanism. This mechanism does have limits - these limits could be physical limits (storage space) or cost limits (storage of logs in the cloud platforms). By default, applications should run with minimal logs but there should be flags and configurations to allow debug of such logs in order to understand how the application work from the outside Formatting of logs. At times, some companies may have decided to go for json formatted log formatting to standardize it across applications/components. One reason to do so is to make it easier for their centralized logging system to parse such logs to do analysis on such logs Providing of distributed traces to understand the application further although this particular functionality is not too important In the case where we provide docker images to be used for deployment. It would be ideal to ensure the image is secure as per user\u0026rsquo;s company\u0026rsquo;s requirements. One way is to utilize small minimalistic base images: e.g. https://github.com/GoogleContainerTools/distroless or alpine images or slim images. Just do know that using these minimalistic images make it harder to build such artifacts (tooling/dependencies may be missing in such minimalistic images) In the case where the application is being updated, is there a smooth upgrade track that users can follow in order to safely upgrade the application without losing the data being stored in the application? As much as we would want users to continue using the application, there are cases where they would need to migrate off the tool. Maybe we would need to consider a way to export the data out of the tool? Our previous point, it\u0026rsquo;s mentioned that it would be good to consider a capability to export data out of the tool where necessary. It might be good to also take into mind of maybe a feature/need where data can be imported into the application There are cases where the application that we may be building would provide a frontend. In general, we would just define the paths of such frontend sites without too much worry. However, there are cases where some users would want to put these applications behind proxy and would want to direct users to the application via a prefix path. To support this need, we would need to have a feature to allow users to define if the application would have some sort of prefix path for all the endpoints/frontend of the application Scalability # In some cases, some of the applications might turn out popular in the client\u0026rsquo;s company. There might be a need to scale up or scale out the applications and it would be ideal if there are methodologies/steps/metrics to look out for in an attempt to scale it\nSome points to think about:\nA guide on how to scale out application. If application is to be deployed on Kubernetes with a helm chart, one can utilize the HorizontalPodAutoscaler resource and define some default values that clients can kind of use It might be good to provide which metrics can be used/based on to scale out the application. We can scale an application based on its CPU usage or Memory usage. However, we can also go with something slightly more controversial metrics (e.g. number of items in a queue). However, the platform that hosts this needs to be able to support this capability. Does the application need to able to form a cluster if needed? Ideally, it would be best to avoid setting up cluster capability since that would make it really hard to maintain/test such applications. In my own opinion, clustering in application is reserved for stateful applications which doesn\u0026rsquo;t apply for many applications Implemented Examples # Some of the above points have been found to be implemented in many of the open source projects out there.\nJaeger Distributed Tracing project Github Link: https://github.com/jaegertracing/jaeger Documentation Page: https://www.jaegertracing.io/docs/1.23/getting-started/ Elasticsearch project Definitely a good example of what to follow when attempting to build applications which is aimed to be used by big companies ","date":"30 June 2021","externalUrl":null,"permalink":"/notes-for-building-apps-to-be-deployed-on-client-infrastructure/","section":"Posts","summary":"This is definitely not an exhaustive list of items to consider but definitely some of the more obvious features that client side users would look out for and consider when attempting to install such third party apps and operate it on their infrastructure.\n","title":"Notes for building apps to be deployed on client infrastructure","type":"posts"},{"content":" What and why systemd? # Systemd is a convenient set of tooling that can be used to manage services and applications on a linux server. When we are managing applications on a server, we would want the following properties automatically for most application - the requirements are somewhat for most applications:\nApplication should be able to restart if application panics/errors out Application should start even if we rebooted the server Logs should be able to handled by a tool that should hopefully do log rotation It would be good to follow the filesystem when putting the files on the server https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard\nManaging golang app with systemd # The golang application that is to be deployed is this. It is just a simple golang application serving some quick text data:\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;log\u0026#34; \u0026#34;net/http\u0026#34; ) func main() { port := 8888 http.HandleFunc(\u0026#34;/\u0026#34;, helloWorldHandler) log.Printf(\u0026#34;Server starting on port %v\\n\u0026#34;, port) log.Fatal(http.ListenAndServe(fmt.Sprintf(\u0026#34;:%v\u0026#34;, port), nil)) } func helloWorldHandler(w http.ResponseWriter, r *http.Request) { log.Println(\u0026#34;serving\u0026#34;, r.URL) fmt.Fprint(w, \u0026#34;This is a test. Hello World Miaoza!!\\n\u0026#34;) } To build the golang application on a mac, we would probably need to cross compile.\nGOOS=linux GOARCH=amd64 go build -o golang-app app.go We would need to create the golang-app linux user. The user needs to be created to be used to run the application. We would also probably need to copy the application binary for\n# In the case we need to generate new ssh keygen # NOTE: We may need to connect to public ip ssh-keygen -t ed25519 scp -i \u0026lt;ssh file\u0026gt; \u0026lt;local file\u0026gt; \u0026lt;remote file location\u0026gt; ssh -i \u0026lt;ssh file\u0026gt; \u0026lt;username\u0026gt;@\u0026lt;local ip address\u0026gt; sudo useradd golang-app sudo mv ~/golang-app /usr/local/bin/golang-app sudo vim /etc/systemd/system/golang-app.service sudo systemctl enable golang-app sudo systemctl start golang-app sudo systemctl status golang-app # To view logs of the application sudo journalctl -u golang-app -f A simple systemd configuration file to run this application. Save the following configuration to /etc/systemd/system/golang-app.service\n[Unit] Description=Golang Application Requires=network-online.target After=network-online.target [Service] User=golang-app Group=golang-app Restart=on-failure ExecStart=/usr/local/bin/golang-app KillSignal=SIGTERM [Install] WantedBy=multi-user.target For [Install] section, refer to https://unix.stackexchange.com/questions/404667/systemd-service-what-is-multi-user-target\nTo test the application on the server, we would need to be in the terminal of the linux server and use wget or curl to get a http response against the application.\ncurl http://localhost:8888 Bonus Content: Use nginx to access application # Port 8888 is not a common port that is being used by most people. It is best to stick to well known ports for accessing websites - for insecure http websites; it will be port 80. For accessing websites in secure fashion protected by ssl certificates, it will be port 443.\nIf we simply just change our code to use port 80, we will see the following error:\nNov 20 14:55:08 instance-20241120-143311 systemd[1]: Stopped golang-app.service - Golang Application. Nov 20 14:55:08 instance-20241120-143311 systemd[1]: Started golang-app.service - Golang Application. Nov 20 14:55:08 instance-20241120-143311 golang-app[1367]: 2024/11/20 14:55:08 Server starting on port 80 Nov 20 14:55:08 instance-20241120-143311 golang-app[1367]: 2024/11/20 14:55:08 listen tcp :80: bind: permission denied Nov 20 14:55:08 instance-20241120-143311 systemd[1]: golang-app.service: Main process exited, code=exited, status=1/FAILURE Reason for this is because the initial set of ports below 1000 being priviliged ports.\nInstead of doing some trickery/hackery to get this to work, we can simply rely on nginx - nginx already has developed a mechanism where nginx (a pretty mature application) - it is a common ways to do this\nsudo apt install nginx We then need to add some configuration in nginx to point nginx to our application.\nserver_name _; location / { # First attempt to serve request as file, then # as directory, then fall back to displaying a 404. # We simply need to comment out the following line and then add proxy_pass #try_files $uri $uri/ =404; proxy_pass http://localhost:8888; } This would then allow us to access the web application from port 80 without changing the user for our application to be root user.\nConfiguration via Environment variables # There are a couple of ways to configure our application:\nExtract configuration from a external provider (e.g. Secrets Manager?) Configuration file Environment variables For extracting configuration from external provider, if we\u0026rsquo;re using a cloud provider, we can have utilize the service account attached to virtual machine to access the apis accordingly.\nIn the case of a configuration file, we would usually code out our application to be able to read files via usual functions that would read and parse the files. The configuration files can be in various formats such as yaml, json, toml etc. This mechanism isn\u0026rsquo;t too affected by us deploying a service and managing it via systemd.\nHowever, when it comes environment variables - this is the one that would be different. Systemd has a approach to pass environment variables on a per service level (e.g. we can 2 or 3 different long lived serivces managed by systemd and each of them can have entirely different configured environment setups)\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;log\u0026#34; \u0026#34;os\u0026#34; \u0026#34;net/http\u0026#34; ) func main() { port := 8888 applicationName := os.Getenv(\u0026#34;APPLICATION_NAME\u0026#34;) if applicationName == \u0026#34;\u0026#34; { fmt.Println(\u0026#34;APPLICATION_NAME environment variable is unset\u0026#34;) } else { fmt.Printf(\u0026#34;APPLICATION_NAME environment set: %v\\n\u0026#34;, applicationName) } http.HandleFunc(\u0026#34;/\u0026#34;, helloWorldHandler) log.Printf(\u0026#34;Server starting on port %v\\n\u0026#34;, port) log.Fatal(http.ListenAndServe(fmt.Sprintf(\u0026#34;:%v\u0026#34;, port), nil)) } func helloWorldHandler(w http.ResponseWriter, r *http.Request) { log.Println(\u0026#34;serving\u0026#34;, r.URL) fmt.Fprint(w, \u0026#34;This is a test. Hello World Miaoza!!\\n\u0026#34;) } We need to alter our systemd file slightly by adding the following in the [Service] section.\nEnvironment=\u0026#34;APPLICATION_NAME=miao\u0026#34; [Unit] Description=Golang Application Requires=network-online.target After=network-online.target [Service] Environment=\u0026#34;APPLICATION_NAME=miao\u0026#34; User=golang-app Group=golang-app Restart=on-failure ExecStart=/usr/local/bin/golang-app KillSignal=SIGTERM [Install] WantedBy=multi-user.target Once we made the change, we would then need to reload it and then restart the service.\nsudo systemctl daemon-reload sudo systemctl restart golang-app Limiting resources via systemd # The above set of files and configuration is to setup a basic golang application that can be managed with systemctl. Let\u0026rsquo;s change it up and see another feature that comes along with systemd - it can be used to restrict resources for a application. We can limit cpu, memory, io, tasks etc.\nIn the following example, we would have an application that would keep allocating large portions of memory. Once it hits a the 1 Gigabyte limit, application should crash (in order to demonstrate the limits being set on the application)\nWe would keep appending a set of bytes to the storeValue variable - the number of times the set of bytes is appended to the storeValue will be logged out.\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;log\u0026#34; \u0026#34;net/http\u0026#34; \u0026#34;strconv\u0026#34; ) var storeValue = [][]byte{} func main() { port := 8888 http.HandleFunc(\u0026#34;/\u0026#34;, helloWorldHandler) log.Printf(\u0026#34;Server starting on port %v\\n\u0026#34;, port) log.Fatal(http.ListenAndServe(fmt.Sprintf(\u0026#34;:%v\u0026#34;, port), nil)) } func helloWorldHandler(w http.ResponseWriter, r *http.Request) { log.Println(\u0026#34;serving\u0026#34;, r.URL) num := r.URL.Query().Get(\u0026#34;number\u0026#34;) n, err := strconv.Atoi(num) if err != nil { n = 5 } for i := 0; i \u0026lt; n; i++ { a := []byte(\u0026#34;abcdefghijklmnopqrstuvwxyz\u0026#34;) storeValue = append(storeValue, a) } log.Printf(\u0026#34;Size of data: %v\u0026#34;, len(storeValue)) fmt.Fprint(w, fmt.Sprintf(\u0026#34;Added %v memory blocks\u0026#34;, n)) } Some resource configuration settings to handle: https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html\nsudo mv ~/golang-app /usr/local/bin/golang-app sudo vim /etc/systemd/system/golang-app.service # To check that the settings was set correctly sudo systemctl daemon-reload sudo systemctl show golang-app sudo systemctl restart golang-app The important parts to be added would be:\nMemoryAccounting=true MemoryMax=1G The full systemctl file for the golang application is this:\n[Unit] Description=Golang Application Requires=network-online.target After=network-online.target [Service] User=golang-app Group=golang-app Restart=on-failure ExecStart=/usr/local/bin/golang-app KillSignal=SIGTERM MemoryAccounting=true MemoryMax=1G [Install] WantedBy=multi-user.target In order to understand this, we can check the status of the application via systemctl calls. Notice the memory field and how there is a \u0026ldquo;maximum\u0026rdquo; value there.\n● golang-app.service - Golang Application Loaded: loaded (/etc/systemd/system/golang-app.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2021-06-12 19:30:08 UTC; 5min ago Main PID: 1124 (golang-app) Tasks: 5 (limit: 4665) Memory: 4.5M (max: 1.0G) CGroup: /system.slice/golang-app.service └─1124 /usr/local/bin/golang-app With that, if we run the following curl commands multiple times, we would eventually hit the 1Gb memory max limit. Once this is crossed, essentially, our application would hit a OOM error and will be forced to stop. The application will restart immediately after that (depends on systemd configuration of the app). We can use other utilities such as top to monitor resource utilization on the server\ncurl localhost:8888?number=1000000 Using systemd for cron jobs # Let\u0026rsquo;s switch up things once more and show another interesting capability; apparently, systemctl can be used to handle periodic task type of application.\nA single shot application to showcase this feature would be simply to print the date and time\npackage main import ( \u0026#34;log\u0026#34; \u0026#34;time\u0026#34; ) func main() { log.Printf(\u0026#34;Current Time: %v\u0026#34;, time.Now()) } Building the application\nGOOS=linux GOARCH=amd64 go build -o golang-time-printer app.go We would then need to do similar steps as above to copy binary files over as well as to create the 2 systemctl files in order to setup the periodic tasks. Once more, we need to need to copy the binary over, and create the require systemctl files etc.\nscp -i \u0026lt;ssh file\u0026gt; \u0026lt;local file\u0026gt; \u0026lt;remote file location\u0026gt; sudo mv ~/golang-time-printer /usr/local/bin/golang-time-printer sudo vim /etc/systemd/system/golang-time-printer.service sudo systemctl enable golang-time-printer sudo systemctl start golang-time-printer sudo systemctl status golang-time-printer Save the following service file in /etc/systemd/system/golang-time-printer.service\n[Unit] Description=Print the date and time Wants=golang-time-printer.timer [Service] Type=oneshot ExecStart=/usr/local/bin/golang-time-printer [Install] WantedBy=multi-user.target Save the following timer file /etc/systemd/system/golang-time-printer.timer. This would run the application defined by the golang-time-printer.service every minute.\n[Unit] Description=Print the date and time Requires=golang-time-printer.service [Timer] Unit=golang-time-printer.service OnCalendar=*-*-* *:*:00 [Install] WantedBy=timers.target We can check status of timer via the following command\nsudo systemctl enable golang-time-printer.timer sudo systemctl start golang-time-printer.timer sudo systemctl status golang-time-printer.timer sudo systemctl list-timers This would be an example of output of the timer\n● golang-time-printer.timer - Print the date and time Loaded: loaded (/etc/systemd/system/golang-time-printer.timer; enabled; vendor preset: enabled) Active: active (waiting) since Sat 2021-06-12 19:56:56 UTC; 1min 13s ago Trigger: Sat 2021-06-12 19:59:00 UTC; 49s left Jun 12 19:56:56 instance-1 systemd[1]: Started Print the date and time. If we are to list the timers via systemctl command\nNEXT LEFT LAST PASSED UNIT ACTIVATES Sat 2021-06-12 20:00:00 UTC 50s left Sat 2021-06-12 19:59:04 UTC 5s ago golang-time-printer.timer golang-time-printer We can check the logs via journald\nsudo journalctl -u golang-time-printer -f These are sample of some of the logs\nJun 12 19:58:02 instance-1 systemd[1]: golang-time-printer.service: Succeeded. Jun 12 19:58:02 instance-1 systemd[1]: Started Print the date and time. Jun 12 19:59:04 instance-1 systemd[1]: Starting Print the date and time... Jun 12 19:59:04 instance-1 golang-time-printer[1717]: 2021/06/12 19:59:04 Current Time: 2021-06-12 19:59:04.16838165 +0000 UTC m=+0.000189497 Jun 12 19:59:04 instance-1 systemd[1]: golang-time-printer.service: Succeeded. Jun 12 19:59:04 instance-1 systemd[1]: Started Print the date and time. Jun 12 20:00:01 instance-1 systemd[1]: Starting Print the date and time... Jun 12 20:00:01 instance-1 golang-time-printer[1738]: 2021/06/12 20:00:01 Current Time: 2021-06-12 20:00:01.763439136 +0000 UTC m=+0.000099331 Jun 12 20:00:01 instance-1 systemd[1]: golang-time-printer.service: Succeeded. As compared to previous ways of managing such periodic tasks such as cron. The nice part that having periodic tasks being managed by systemctl is that all logs is managed by a single interface; there is no need to figure out for each cron task on how logs are managed, how much resources is run, and how frequently the task is run\n","date":"10 June 2021","externalUrl":null,"permalink":"/using-systemd-to-manage-services/","section":"Posts","summary":"What and why systemd? # Systemd is a convenient set of tooling that can be used to manage services and applications on a linux server. When we are managing applications on a server, we would want the following properties automatically for most application - the requirements are somewhat for most applications:\n","title":"Using systemd to manage services","type":"posts"},{"content":"Install nginx on the instance. We would also probably need to install vim as well to make it changes on nginx configuration.\nsudo apt update \u0026amp;\u0026amp; sudo apt install -y nginx vim sudo su mkdir -p /etc/nginx/ssl cd /etc/nginx/ssl Create the following file in ca.config. The following ca configuration is used to create and configure SSL certifications\n[ ca ] # `man ca` default_ca = CA_default [ CA_default ] copy_extensions = copy [req] distinguished_name = req_distinguished_name x509_extensions = server_cert req_extensions = server_cert [req_distinguished_name] commonName = commonname [ v3_ca ] subjectKeyIdentifier = hash authorityKeyIdentifier = keyid:always,issuer basicConstraints = critical, CA:true keyUsage = critical, digitalSignature, cRLSign, keyCertSign [ server_cert ] # Extensions for server certificates (`man x509v3_config`). basicConstraints = CA:FALSE nsCertType = server nsComment = \u0026#34;OpenSSL Generated Server Certificate\u0026#34; subjectKeyIdentifier = hash #authorityKeyIdentifier = keyid,issuer:always keyUsage = critical, digitalSignature, keyEncipherment extendedKeyUsage = serverAuth subjectAltName = @alternate_names [ alternate_names ] DNS.1 = localhost DNS.2 = lol.testtest.com Refer to documentation for more details:\nhttps://jamielinux.com/docs/openssl-certificate-authority/create-the-root-pair.html\nAnother useful documentation:\nhttps://www.openssl.org/docs/man3.0/man5/x509v3_config.html\nThis link pertains to the various portions for the x509 v3 certificate options available for use. It tells what are the various options in there, and what it is for etc.\nopenssl genrsa -out ca.key 2048 openssl rsa -in ca.key -pubout \u0026gt; ca.pub # Certificate \u0026#34;request\u0026#34; but produces a self signed cert instead openssl req -x509 -config ca.config -new -nodes -key ca.key -sha256 -days 365 -out ca.pem -extensions v3_ca Create server SSL certificate\nopenssl genrsa -out dev.app.key.server 2048 # Certificate \u0026#34;request\u0026#34; # In order to make it easier - use *.example.com for common name openssl req -new -key dev.app.key.server -out dev.app.csr openssl x509 -req -in dev.app.csr -CA ca.pem -CAkey ca.key -CAcreateserial -out dev.app.crt.server -days 365 -sha256 -extfile ca.config -extensions server_cert cp dev.app.crt.server dev.app.crt cat ca.pem \u0026gt;\u0026gt; dev.app.crt cp dev.app.key.server dev.key.crt cat ca.key \u0026gt;\u0026gt; dev.key.crt With that, edit nginx accordingly to allow ssl traffic on the following file /etc/nginx/sites-available/default. Ensure one of the blog have https port, 443 be allowed with ssl and to have the ssl certificate and ssl certificate that we created used here\nserver { listen 443 ssl default_server; ssl_certificate /etc/nginx/ssl/dev.app.crt; ssl_certificate_key /etc/nginx/ssl/dev.key.crt; ... } We can go into another VM instance on Google Compute Engine and try to curl it to the server instance. Copy over the ca.pem over from the server. Google Cloud instances\ncurl --cacert ca.pem https://instance-1 We would get the following error\ncurl: (60) SSL: no alternative certificate subject name matches target host name \u0026#39;instance-1\u0026#39; More details here: https://curl.haxx.se/docs/sslcerts.html curl failed to verify the legitimacy of the server and therefore could not establish a secure connection to it. To learn more about this situation and how to fix it, please visit the web page mentioned above. Note from above that only 2 domains is acceptable: localhost and lol.testtest.com. Add ip address of the server with the lol.testtest.com domain to the /etc/hosts file.\nWe can then use curl on domains specified in the SSL Cert - lol.testtest.com to obtain the response accordingly.\ncurl --cacert ca.pem https://lol.testtest.com Create client SSL certificate request. We need to pass the certificate request to the instance that has the ca certificate to sign it\nopenssl genpkey -algorithm RSA -out client.key -pkeyopt rsa_keygen_bits:2048 openssl req -new -key client.key -out client.req -subj /CN=testtest Sign it and put it back to the caller instance\nopenssl x509 -req -in client.req -CA ca.pem -CAkey ca.key -set_serial 101 -extensions client -days 365 -sha256 -outform PEM -out client.crt openssl x509 -in client.crt -noout -text Adjust nginx configuration - /etc/nginx/nginx.conf\nhttp { map $ssl_client_s_dn $allowed { default no; \u0026#34;CN=testtest\u0026#34; yes; } ... Adjust the following nginx configuration - /etc/nginx/sites-available/default\nserver { ... listen 443 ssl default_server; ssl_certificate /etc/nginx/ssl/dev.app.crt; ssl_certificate_key /etc/nginx/ssl/dev.key.crt; ssl_verify_client on; ssl_client_certificate /etc/nginx/ssl/ca.pem; if ($allowed = \u0026#34;no\u0026#34;) { return 403; } ... We can run curl request\ncurl --cacert ca.pem https://lol.testtest.com But we would receive the following response though\n\u0026lt;html\u0026gt; \u0026lt;head\u0026gt;\u0026lt;title\u0026gt;400 No required SSL certificate was sent\u0026lt;/title\u0026gt;\u0026lt;/head\u0026gt; \u0026lt;body bgcolor=\u0026#34;white\u0026#34;\u0026gt; \u0026lt;center\u0026gt;\u0026lt;h1\u0026gt;400 Bad Request\u0026lt;/h1\u0026gt;\u0026lt;/center\u0026gt; \u0026lt;center\u0026gt;No required SSL certificate was sent\u0026lt;/center\u0026gt; \u0026lt;hr\u0026gt;\u0026lt;center\u0026gt;nginx/1.14.2\u0026lt;/center\u0026gt; \u0026lt;/body\u0026gt; \u0026lt;/html\u0026gt; We would need to pass the client ssl certificates\ncurl --cacert ca.pem --cert client.crt --key client.key https://lol.testtest.com Additional Information # If we simply wish to setup SSL cert in nginx without needed client authentication - we would just need to need to copy the \u0026ldquo;root\u0026rdquo; ca.pem and copy it into /usr/local/share/ca-certificates. Ensure that the file ends with .crt. We can do so by copying (assuming we\u0026rsquo;re in the folder where we created all the certs\u0026hellip; - mv ca.pem /usr/local/share/ca-certificates/ca.crt). The next step would be to restore and update the ca-stores. This is done by running the following: sudo update-ca-certificates. Although, this would only affect on OS level. Browsers still require an update to their respective ca-stores. ","date":"10 May 2021","externalUrl":null,"permalink":"/basic-ssl-setup-server-and-client-ssl-certificate-setup/","section":"Posts","summary":"Install nginx on the instance. We would also probably need to install vim as well to make it changes on nginx configuration.\nsudo apt update \u0026\u0026 sudo apt install -y nginx vim sudo su mkdir -p /etc/nginx/ssl cd /etc/nginx/ssl Create the following file in ca.config. The following ca configuration is used to create and configure SSL certifications\n","title":"Basic SSL Setup - server and client SSL certificate setup","type":"posts"},{"content":"NOTE: As software advances, some of the commands shown below may become depreciated/irrelevant. If one encounters errors - check the output logs to see what the issue is (e.g. missing library? missing dependency? wrong folder structure due to being unable to find a file)\nFor the following commands below, we need to run the following code on a machine running Centos OS 7 or in a VM running Centos OS 7. We cannot run it in docker. This is because systemd is not exactly available/useable in docker - there are hacks, but its better to just proceed on to just run these in a Virtual Machine\nThese are some notes when it comes to building Nginx RPM for centos use. This can be used to further customize Nginx RPM\nCreate a Google Compute Engine with Centos 7 OS\nInstall required yum dependencies - some are needed to run commands (e.g. make, wget, git etc)\nsudo yum install -y git wget make \\ gcc rpm-build GeoIP-devel zlib-devel \\ pcre-devel gd-devel libedit-devel which \\ perl-devel perl-ExtUtils-Embed libxslt-devel \\ openssl-devel Import the nginx source code. Apparently, google mirrored the mercurial nginx source code to the following repo. We can safely use git to clone the source code and then checkout out one of the latest version of nginx and try to build up the rpm.\n# Clone source code git clone https://nginx.googlesource.com/nginx-pkgoss # Enter the folder which contains source code cd nginx-pkgoss # Go to specific version of nginx release git checkout nginx-1.19.8 # Go into rpm folder to view the Makefiles cd ./rpm/SPECS # Run make command to build all modules - there are other options # The main one to ensure that it is possible to build would be \u0026#34;base\u0026#34; make all View the built rpm and test it out to ensure that we can run it etc\n# Find the built nginx rpms cd $HOME/nginx-pkgoss/rpm/RPMS/x86_64 # Install the dependencies for nginx sudo yum install -y openssl # Check openssl version to ensure its installed openssl version # Install the built nginx rpm sudo rpm -i nginx-1.19.8-1.el7.ngx.x86_64.rpm # Check nginx is installed and is available for use nginx -V # Start nginx sudo systemctl start nginx # Check to ensure that nginx is working and is in running state sudo systemctl status nginx # Run curl command to ensure that nginx is able to actually serve the traffic curl localhost This should be the output you should be receiving to show that nginx server is properly started and can receive traffic accordingly\n\u0026lt;!DOCTYPE html\u0026gt; \u0026lt;html\u0026gt; \u0026lt;head\u0026gt; \u0026lt;title\u0026gt;Welcome to nginx!\u0026lt;/title\u0026gt; \u0026lt;style\u0026gt; body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } \u0026lt;/style\u0026gt; \u0026lt;/head\u0026gt; \u0026lt;body\u0026gt; \u0026lt;h1\u0026gt;Welcome to nginx!\u0026lt;/h1\u0026gt; \u0026lt;p\u0026gt;If you see this page, the nginx web server is successfully installed and working. Further configuration is required.\u0026lt;/p\u0026gt; \u0026lt;p\u0026gt;For online documentation and support please refer to \u0026lt;a href=\u0026#34;http://nginx.org/\u0026#34;\u0026gt;nginx.org\u0026lt;/a\u0026gt;.\u0026lt;br/\u0026gt; Commercial support is available at \u0026lt;a href=\u0026#34;http://nginx.com/\u0026#34;\u0026gt;nginx.com\u0026lt;/a\u0026gt;.\u0026lt;/p\u0026gt; \u0026lt;p\u0026gt;\u0026lt;em\u0026gt;Thank you for using nginx.\u0026lt;/em\u0026gt;\u0026lt;/p\u0026gt; \u0026lt;/body\u0026gt; \u0026lt;/html\u0026gt; ","date":"1 May 2021","externalUrl":null,"permalink":"/building-nginx-rpm-from-source/","section":"Posts","summary":"NOTE: As software advances, some of the commands shown below may become depreciated/irrelevant. If one encounters errors - check the output logs to see what the issue is (e.g. missing library? missing dependency? wrong folder structure due to being unable to find a file)\n","title":"Building Nginx RPM from source","type":"posts"},{"content":"This are some notes in the case where one wants to deploy a bunch of python \u0026ldquo;microservices\u0026rdquo; to a Google Kubernetes Engine cluster. These notes emphasize on the basics rather than the various nuances of running a \u0026ldquo;production\u0026rdquo; grade python application.\nThis is our python flask application that we would deploy - a simple flask app\nfrom flask import Flask app = Flask(__name__) @app.route(\u0026#39;/\u0026#39;) def hello_world(): return \u0026#39;Hello, World!\\n\u0026#39; if __name__ == \u0026#39;__main__\u0026#39;: app.run(host=\u0026#39;0.0.0.0\u0026#39;, port=8080) To run the flask app, it would be best to have some form of requirements file to handle the various dependencies\nclick==7.1.2 Flask==1.1.2 itsdangerous==1.1.0 Jinja2==2.11.3 MarkupSafe==1.1.1 Werkzeug==1.0.1 requests==2.25.1 Since we\u0026rsquo;re deploying it to Google Kubernetes Engine, we would need to create a container out of it. We will do this by using Docker. In order to build the docker container, we would need to create a dockerfile for it:\nFROM python:3.6 COPY requirements.txt . RUN pip install -r requirements.txt COPY . . CMD python sample-flask-app.py To deploy such an application to Google Kubernetes Engine, we can run the following commands. (Once we have our Google Kubernetes Engine cluster and we have connected to it - our kubectl tool can access and query/modify the cluster accordingly)\n# Building the image docker build -t sample-app . # After building, we would to test that the app works as expected as well docker run -p 8080:8080 -d sample-app # Retag the image for Google Container Registry docker tag sample-app gcr.io/XXX/sample-app:v1 # Push to Google Container Registry docker push gcr.io/XXX/sample-app:v1 # Create a \u0026#34;deployment\u0026#34; in Google Kubernetes Engine kubectl create deployment sample-app --image gcr.io/XXX/sample-app:v1 # Create a \u0026#34;service\u0026#34; in Google Kubernetes Engine # Wait for the ip address and curl against it kubectl create service loadbalancer sample-app --tcp=80:8080 Now that we have the most basic setup working. Let\u0026rsquo;s instead move to a scenario where we have multiple python services. We have one python service calling our sample-app python service as previously mentioned\n# Delete load balancer service kubectl delete service sample-app # Create internal ip - we don\u0026#39;t want to expose it this time kubectl create service clusterip sample-app --tcp=8080:8080 # Check response and can access it kubectl create deployment test --image=nginx kubectl exec -it \u0026lt;pod-name\u0026gt; -- /bin/bash # Inside the container apt update apt install dnsutils nslookup sample-app curl sample-app:8080 We now know what is the address to contact our sample app is on, let\u0026rsquo;s embed it into our second application. Note: It is actually better to make this one configurable in order to allow operators of the application to change the address if needed. If this wasn\u0026rsquo;t done, that would mean we would need to rebuild the app each time that is address update of the sample-app service\nfrom flask import Flask import requests app = Flask(__name__) @app.route(\u0026#39;/\u0026#39;) def hello_world(): return \u0026#39;First Service!\\n\u0026#39; @app.route(\u0026#39;/main\u0026#39;) def first_service_handler(): resp = requests.get(\u0026#34;http://sample-app:8080\u0026#34;) return resp.text if __name__ == \u0026#39;__main__\u0026#39;: app.run(host=\u0026#39;0.0.0.0\u0026#39;, port=8080) This is the dockerfile for it. It uses the same requirements.txt file as above\nFROM python:3.6 COPY requirements.txt . RUN pip install -r requirements.txt COPY . . CMD python first-service-app.py How do we test it locally though?\nWe would generally utilize docker-compose here since it\u0026rsquo;s pretty troublesome to understand and go through the whole docker networking stack to get something up and running between the various services\nIf we are to simulate the above in a docker-compose setup\nversion: \u0026#34;3.5\u0026#34; services: \u0026#34;sample-app\u0026#34;: build: context: . ports: - \u0026#34;8080:8080\u0026#34; \u0026#34;first-service\u0026#34;: build: context: . dockerfile: first-service.Dockerfile ports: - \u0026#34;8081:8080\u0026#34; Note the weird issue here when using it in Google Cloud Shell: https://github.com/google-github-actions/setup-gcloud/issues/128\nexport LD_LIBRARY_PATH=/usr/local/lib To get all the python services above to run locally, we can run the following commands:\n# To bring all the services up docker-compose up # To bring it all down docker-compose down With local testing out of the way, let\u0026rsquo;s now focus on deploying the first-service application\n# Build the first service docker image docker build -t gcr.io/XXX/first-service:v1 -f first-service.Dockerfile . # Push the first service docker image docker push gcr.io/XXX/first-service:v1 # Deploy service kubectl create deployment first-service --image=gcr.io/XXX/first-service:v1 # Create load balancer to have traffic go to it kubectl create service loadbalancer first-service --tcp=80:8080 ","date":"18 April 2021","externalUrl":null,"permalink":"/python-flask-apps-in-kubernetes/","section":"Posts","summary":"This are some notes in the case where one wants to deploy a bunch of python “microservices” to a Google Kubernetes Engine cluster. These notes emphasize on the basics rather than the various nuances of running a “production” grade python application.\n","title":"Python Flask Apps in Kubernetes","type":"posts"},{"content":"Sometime earlier this year (2021), Google Cloud Run started to support websocket support - which is one of the critical components in order to be able to run a R Shiny Dashboard application.\nRefer to the the following documentation on the Google Cloud Run website:\nhttps://cloud.google.com/run/docs/release-notes\nhttps://cloud.google.com/run/docs/triggering/websockets\nLet\u0026rsquo;s see how to quickly get a R Shiny Server application running on Google Cloud Run. But before we get one to run on the Google Cloud Run service, let\u0026rsquo;s try to get one running on our local computer.\nIf you head over to the following website, we can just quickly run a simple docker image that already has shiny installed in it and test it out on our local computer.\ndocker run --rm -p 3838:3838 rocker/shiny:4.0.0 If we head over to the local url: http://localhost:3838/sample-apps/hello/. The website displayed should be one that is interactive; it shouldn\u0026rsquo;t be completely grayed out. If grayed out, that would mean that the websocket connection has already expired or broke etc. Unfortunately, seeing that this situation deals with own local computer, it would be something that you have debug manually on your own if any issue arises.\nThere are other sample functionality available in this image. You can check them out before configuring further:\nhttp://localhost:3838/01_hello/ http://localhost:3838/02_text/ http://localhost:3838/03_reactivity/ You can find and understand more by going into the following folder in the container: /srv/shiny-server\nPushing the rocker/shiny image to Google Container Registry/Artifact Registry # The first step before before we can deploy such a service to Google Cloud Run is to get the image into the project\u0026rsquo;s container/artifact registry we wish to deploy in.\nEnsure that our docker cli tool can authenticate to our Google Cloud Project\nhttps://cloud.google.com/container-registry/docs/advanced-authentication#gcloud-helper\ngcloud auth login gcloud auth configure-docker With that, we can then push the image to the respective project\u0026rsquo;s container/artifact registry. If we are to push to the container registry (Please substitute the project id accordingly):\n# Re-tag the public rocker/shiny docker image to point to gcr.io/\u0026lt;project-id\u0026gt; registry docker tag rocker/shiny:4.0.0 gcr.io/\u0026lt;project-id\u0026gt;/rocker/shiny:4.0.0 # Push image docker push gcr.io/\u0026lt;project-id\u0026gt;/rocker/shiny:4.0.0 Although its also possible to push the images to artifact registry and use the images in said registry to deploy the services on Google Cloud Run, we would not cover it here. On initial look, it looks way to pricey as compared to just relying on Google Container Registry instead.\nCreate the Cloud Run Service # To deploy the service, we can use the UI simply and deploy it or we can utilize the following gcloud command\ngcloud run deploy shiny-dashboard --concurrency=1 --memory=2Gi --platform=managed --region=asia-northeast1 --allow-unauthenticated --port 3838 --image=gcr.io/\u0026lt;project-id\u0026gt;/rocker/shiny:4.0.0 We can then test that this works as expected and that this whole setup works as expected for this. Do note that this setup has no authentication which may mean issues if you only mean to restrict this R shiny dashboard to only internal company access.\nExpose dashboards for selected users # If all we wanted to do was just to deploy the dashboard in Google Cloud Run, then we can just stop with the actions mentioned above. However, in many cases, we usually deal with \u0026ldquo;private\u0026rdquo; datasets that should only be accessed in internal company settings; it should not be exposed to public.\nDo take note that R Shiny library doesn\u0026rsquo;t have anything that deals with authentication. (I\u0026rsquo;m talking more of the open source edition). I\u0026rsquo;m pretty sure that if you were to look into the enterprise level Shiny framework, you might be able to find some sort of authentication mechanism.\nSo in order to do protect our data/dashboards, it would be good to put some sort of proxy in front of the dashboards. We can do that via nginx but the authentication options available that are available out of the box might be a little limited. In cases where we would want to authenticate using Google accounts etc, that option may not be readily via nginx mechanism. We might need some external mechanism to string together with nginx which we can then use it to proxy it on. If we\u0026rsquo;re ok with password based authentication, doing it via nginx might be quite easy, although it would still involve setting nginx in some virtual machine etc.\nWe can instead look into another mechanism that Google Cloud already provide: IAP (short for Identity Access Proxy). Refer to the following documentation for details on this capability: https://cloud.google.com/iap/docs/concepts-overview\nIAP is not directly integrated with Google Cloud Run, but it is already integrated with Google Cloud Load Balancer. And we can have Google Cloud Load Balancer serve traffic with Google Cloud Run as its backend. Refer to the following documentation for details on this capability: https://cloud.google.com/load-balancing/docs/https/setting-up-https-serverless\nWith that we can have IAP for our Shiny dashboard via Google Load Balancer by doing the following:\nDeploy the Google Cloud Run and ensure that it requires no authentication but it is only exposed to private network as well as the load balancer Configure a Google Cloud Load Balancer (GCLB) with backends Might require creating serverless network endpoints to point to our aleady setup Google Cloud Run service Ensure that you have a domain to not deal with creating the SSL certificates manually. Alternatively, you can create Self Signed SSL certs but you would need to be kind of familiar with all the commands here. Out of convenience, it would be better to just purchase a domain and toss it to Google to manage the SSL cert etc It takes about 10-15 minutes to wait for a good response from load balancer - it takes a long while to provision it At this point, test to make sure that the Shiny Dashboard can be accessed from a browser via the domain pointed to it. Add your Google user that you would want to access the Shiny dashboard with the IAP-secured Web App User role. Reference: https://cloud.google.com/iap/docs/app-engine-quickstart ","date":"5 March 2021","externalUrl":null,"permalink":"/cloud-run-websocket-support-now-allows-you-to-deploy-a-r-shiny-server-as-a-serverless-app-to-gcp-cloud-run/","section":"Posts","summary":"Sometime earlier this year (2021), Google Cloud Run started to support websocket support - which is one of the critical components in order to be able to run a R Shiny Dashboard application.\n","title":"Cloud Run Websocket support now allows you to deploy a R Shiny Server as a serverless app to GCP Cloud Run","type":"posts"},{"content":"A long time back, sometime in 2019 (which is almost an eternity ago ), I kind of did up an application that can take some slides saved in a pdf file and generate a video out of it. I kind of talked about it in a lightning session during the following event at Google Devspace https://events.withgoogle.com/la-kopi-serverless/. The input to the application would be the slides in a pdf format as well as some sort of \u0026ldquo;script\u0026rdquo;. The words in the script would be used to generate the voiceover and then it would be used as part of the video. Essentially, the aim of the app would be create a \u0026ldquo;presented\u0026rdquo; version of the slides in a video form without requiring a person to present it. Everything about it is just generated via tools/products available on GCP.\nThe project is kind of a pet project that I continued working on; 2 years later, the structure of the project is definitely way different compared as to when I first started with the application. I will probably list out the list of changes and provide reasons for why said changes are being made.\nAlso, this was previously a closed source application. I\u0026rsquo;ve cleaned up the repo such that it should be ok as an open source code - but there is still plenty of work that still needs to be done\nThe link to the repository is here:\nhttps://github.com/hairizuanbinnoorazman/slides-to-video\nMoving from separate git repos into one repo # When I first started the whole project, the application is split into multiple microservices. It kind of made sense then - the application\u0026rsquo;s main deployment target was Google Cloud Run. For the uninitated, one could say Google Cloud Run is just you (the user) getting Google to run docker containers on your behalf. Pricing is based on utilization of the container - which makes this a very cheap option to build and deploy small personal projects.\nThe structure of the projects was in 5 git repos. It is organized as follows:\nAPI (as well as manager) PDF Splitter Service (receives tasks as Jobs) Generate Short Video Snippets on per slide basis (receives tasks as Jobs) Concatenating Short Video Snippets into one final video (receive tasks as Jobs) Frontend (Basic Elm frontend) The 5 microservices are deployed as 5 different services with Pubsub as the central plumbing that connects the \u0026ldquo;API server\u0026rdquo; which is like a manager of sorts with the worker services (PDF Splitter, Generate Short Video Snippets and the Concatenating short video snippets services).\nHowever, seeing that initial development of the microservices have always been targetted to be deployed on Google Cloud Run - it makes it really really difficult to test. This is kind of opposite of the promise of utilizing containers where we can have the same container on dev environment and test it locally before deploying it to production.\nThe problem is not on per microservice level. It is relatively easy to test each microservice individually since each microservice exposes a http endpoint. It is easier to just test that endpoint and check that the required output is produced. Unfortunately, it is close to impossible to test the integration of the 5 microservices locally (due to the reliance on Google Pubsub mechanism to invoke the worker microservices etc). Also, there are bits of hard-coded urls peppered all over the codebase across the 5 microservices making it impossible to have a local environment to test with\nAfter the initial demo during event on Google Devspace, I reflected on this and decided that a local environment is important. It doesn\u0026rsquo;t make sense to keep deploying to Google Cloud Run just to test the functionality of the application. At the same time, I would want to have the application to be deployable to various other environments; including normal Virtual Machine environment as well as Kubernetes environments. This would mean a huge re-architecting of the code base is needed.\nBefore making any changes, I decided to take a look at code repos out there and how some of them are doing such code structures within their git repositories. Some examples I was referencing from were Loki codebase (Grafana), Jaeger (Jaegertracing). These example opensource code out there were also microservices and it looks like the code for them are all dumped into a single git repository. With this reference, the Slides to Video application that I am building would follow these example code structures.\nThere are definite benefits from undergoing this change:\nEasier code maintainance. Code is no longer across 5 git repositories but instead accessible from a single git repository. The alternative would be to set up a sixth git repository that would have 5 git submodules to the separate git code repositories Easier to set up a docker-compose file that can stitch together all 5 microservices into one single integrated setup Less code duplication. Many of the worker services need to interact with same Google Cloud Services (mainly Google Cloud Storage). Previously, the same code to interact with it was copied across the various microservices. Any change needs to be replicated over. An alternative to this setup if the microservices had been in separate git repositories is to set up a \u0026ldquo;common\u0026rdquo; golang library which seems like overkill in this scenario Easier to setup and ensure that all versions of all microservices for this project is synchronized. There is no need to think too much about version compatability between the microservices There are possible fallbacks for such a setup though\nHarder to set up CI workflows for each individual microservice. In the case where code is only changed in one of the microservice - how to limit the testing to only that microservice? What if the code changed is shared code that is used across multiple microservices? Or do we just bite the bullet and test all the code each time any code is changed? Deciding a frontend technology # There are too many frontend frameworks, tools and languages out there. Each frontend comes with its own benefits and drawbacks. Let\u0026rsquo;s go through the various popular options out there:\nReactJS VueJS AngularJS Plain HTML + Javascript served from Backend Server ELM (I\u0026rsquo;ll admit this ain\u0026rsquo;t too many popular in usual lists) I generally draw the line for the 3 Javascript frameworks - ReactJS, VueJS and AngularJS. For ReactJS, the framework moves too fast. As a backend engineer, I generally don\u0026rsquo;t follow and keep along with trends of that framework and I have a very strong feeling that even if I coded a decent frontend now, a few months later, it might be \u0026ldquo;outdated\u0026rdquo; and I might have to change to keep up with the documents. In the case of VueJS and AngularJS, I wouldn\u0026rsquo;t choose them as well for my personal project, partly due to unfamilarity and also due to the fear that they too move just as quickly as ReactJS. E.g. Angular already hitting it\u0026rsquo;s 12th major version as of the release of this blog post.\nInitially, I wanted to just stick to plain html and javascript served from a backend built and compiled using Golang. However, after a short while, working with Javascript serves to be more difficult that expected. The context switch moving between html, javascript and golang makes it pretty hard to handle state as well as to manage the data on frontend. At the same, while coding javascript, I did realize that its pretty difficult to not rely on any framework; there are too many frontend concerns which I would need to manually handle if I happen to not rely on the frameworks. I even tried looking into libraries like Jquery but after comparing the development experience compared to something I tried previously (Elm), I found the experience severely lacking.\nThis leaves Elm, a language that I tried out previously (and I kind of liked the initial experience of working with it). One the main reasons for liking it is the slow rate of updates to the language - see the Elm version history, each version upgrade are months/years away from each other. Another nice aspect is definitely all the helpful error messages that are thrown the moment something is amiss in the elm codebase. E.g. Wrong types, typos in variables, unused variables, unhandled conditions etc. Such features make it nice to work with elm since I generally won\u0026rsquo;t be working/dealing too much on the frontend for this project. The frontend serves to be a basic UI to interact with the system.\nAs much as I call out to the various nice features in Elm, there are definitely things to look out for as one works with it. Elm is way less popular as compared to Javascript framework and it shows in Google Search Results. There is less stack overflow articles to help you solve your problems which means that there are times where one would need to tinker around with the code till it work (usually the error messages will help with this; I haven\u0026rsquo;t been stuck for too long while working with it)\nIntegration Tests are suddenly very important # One of the changes I thought of implementing in the project is the capability for the project to be deployed on multiple platforms. Some of the targeted platforms to be deployed to for the project would be Kubernetes and Google Cloud Run. In the future, I do want to deploy to platforms on other cloud providers like AWS lambda or even on manually Knative platforms etc.\nHowever, this form of capability requires plenty of qa work to ensure that the functionality of the project is consistent between the different deployments. I definitely can\u0026rsquo;t afford to do that manually - the only way to do it consistently would be to write up a whole suite of integration tests to do a consistent behaviour as the project is deployed to various platforms.\nThe integration tests are mostly just api calls called via pytest scripts. However, at the moment, I have not set up proper Continuous Integration workflow to test it on the various platforms. There is definitely a need to do up some scripts to build the artifacts as well as to deploy said artifacts to targeted and platform and then to run the pytest scripts. This would definitely take a quite a bit of effort to set up. There are other things to also consider here which is to decide where to run said integration tests (a manual Jenkins setup? Or Google Cloud Build? Or Github Actions?)\nI will provide an update on the continuous integration efforts in future blog posts on this project.\n","date":"28 February 2021","externalUrl":null,"permalink":"/lessons-from-building-slides-to-video-app-part-1/","section":"Posts","summary":"A long time back, sometime in 2019 (which is almost an eternity ago ), I kind of did up an application that can take some slides saved in a pdf file and generate a video out of it. I kind of talked about it in a lightning session during the following event at Google Devspace https://events.withgoogle.com/la-kopi-serverless/. The input to the application would be the slides in a pdf format as well as some sort of “script”. The words in the script would be used to generate the voiceover and then it would be used as part of the video. Essentially, the aim of the app would be create a “presented” version of the slides in a video form without requiring a person to present it. Everything about it is just generated via tools/products available on GCP.\n","title":"Lessons from building Slides to Video App - Part 1","type":"posts"},{"content":"DISCLAIMER: The following article is just an opinion. Naturally, each person have their own work experiences that they can use to project their future plans; so take the items in this article with a large spoonful of salt when applying it into your own perspective.\nLook to the business side as well Consultants Engineers in a product company Platform Engineers Narrowing down what to learn Referring to job roles before studying Personal Projects Just do the interview Look to the business side as well # When I first started my career in the technology track (software engineer, devops engineer), I initially thought that the only to move up the company hierarchy is to get good at what you do. So, in the case of being a software developer, that would probably mean being involved in writing up efficient and useful code that will be included into products. And maybe in the case of a devops engineer; be familiar with the various deployment platform and tools out there in the market. This does make sense in a way; as one gets better at their job, they would and should receive larger compensation packages. The experience and expertise that the engineer earned should allow the company to produce better services/products.\nHowever, as with all things, it is good to broaden our perspective and look to the \u0026ldquo;business side\u0026rdquo; of things. At the end of the day, engineering skills aren\u0026rsquo;t the ones that earning the paychecks. The products and services provided to customers are the ones that are earning the revenue for the company. By understanding the \u0026ldquo;business side\u0026rdquo; of things of the company, we can try to understand how revenue is earned and how it impacts the treatment we, as engineers, receive from a company.\nThis article is mostly going to look from an aspect of a software engineer and the various job options and routes available for him/her. However, even for other roles, it is pretty easy to try to apply the same methodoloy to understand future prospects - we would just need to understand how the money flows and how you as the engineer/employee is bringing value to the company (the organization that hired you).\nGenerally, from my roughly 4-5 years of working experience in this sector, I would roughly segment the job options as a software engineer in the following \u0026ldquo;job types\u0026rdquo;. The business aspects of a company affects the role in various ways:\nLet\u0026rsquo;s approach each of the following categories one at a time\nConsultants # This job option is one where a software engineer that is employed in a company is not working in said company, but instead deployed to client companies. From this point onwards, essentially, you as the software engineer mostly report to the people in the client company, but salary, pay, benefits are handled by the company that employed you.\nBefore proceeding further, let\u0026rsquo;s lay the common understanding that the company that would you to be deploy to client companies are \u0026ldquo;consultancy companies\u0026rdquo; whereas client companies are companies that pay good money to \u0026ldquo;consultancy companies\u0026rdquo; to get some sort of workforce\nThere are some good points with this arrangement:\nEasier to get attached to large, organizations which would provide learning opportunities to learn how large organizations operate Of course there are some bad points to this arrangement:\nHigh chance that one would be working on the \u0026ldquo;boring\u0026rdquo; bits in a software engineering job. Think about it; if you were one of the tech directors of the client companies, would you rely on a transient sort of workforce to handle your core operations? (possibly interesting bits) Pay is supressed as compared to the rest of the industry. The company that employs you is the middleman here and needs to make a bit of profit by sending you to the client companies. That would mean that in order to compete properly, consultancy companies would need to try to lower their prices in order to make the contract appear like a better deal. Engineers in a product company # In the various product companies out there, one of the common roles would be backend engineers that would need to create the various features in the product that the company sell. In most cases, that would usually mean creating API services (because everyone is somehow into the microservices/REST way of doing things now) which would store data into some sort of database. In a simplistic sense, most of the apps out there are CRUD (create, read, update, delete) applications.\nNot sure for you but in my opinion, it\u0026rsquo;s kind of easy to get bored writing one crud app after another. Eventually, creating such apps would just mean copying and paste a whole chunk of code to get it the required application functionality.\nAnother bad thing that usually happens at this level of engineers in a product company is that product features that needs to be developed is dependent on external forces such as product sales/marketing. Sometimes, in order to win a certain contract, certain features are required to be created. The features required may be changed on whims on the customer, making the product requirements vague. This makes it hard to build the product required.\nSome teams attempt to combat this by making features more \u0026ldquo;generalizable\u0026rdquo;. But doing this would make feature development slower. More generalized features take more time to develop and such features requires even more time to test throughly.\nThere are some interesting bits but one would sometimes need to dig a little to get to such interesting bits:\nA application that is has pretty high usage and requires it to be scaled accordingly to handle the incoming workload. Scaling is usually pretty hard; at times, it may require you to deconstruct aspects of your applications (e.g. ORM in applications) and tweak them in order to create more efficient queries or introduce caching etc. However, it\u0026rsquo;s important to note here that not all applications need to scale and not all applications to run and complete tasks \u0026ldquo;asap\u0026rdquo;. It\u0026rsquo;s better to focus more resources into applications that actually matter (the main revenue earning services) of the products that the company sell. Refactoring applications. To some, it is a necessary evil, but to others, it gives an opportunity to understand the codebase further. Before refactoring, we would need to understand that development work on the application will still need to continue; it cannot be put on a complete pause just for \u0026ldquo;refactoring\u0026rdquo;. That would mean having the challenge of doing refactoring would mean slowly introducing new abstractions and slowly migrate away from old abstraction to utilize the new abstractions. I generally find that the following video explains this way better than what an article could cover: https://www.youtube.com/watch?v=h6Cw9iCDVcU However, similar to the \u0026ldquo;consultancy\u0026rdquo; category of engineers, not all engineers are valued equally. Some engineers would appear to be \u0026ldquo;valued\u0026rdquo; by a company as compared to others. You can see from the kind of \u0026ldquo;interesting\u0026rdquo; problems being thrown at such engineers as well as faster promotion cycles. Why is this so?\nI would boil to it all down to this one term: Leverage. You as an engineer would need to understand how your work would impact the company. Even if the impact is not in the monetary sense and is more on the \u0026ldquo;efficiency\u0026rdquo; sense, the value of such work still needs to be conveyed to upper management. The following video explains this way better: https://www.youtube.com/watch?v=SclqaNqqAV0\nPlatform Engineers # Platform engineers are the engineers are that are usually far removed from the services that are \u0026ldquo;revenue\u0026rdquo; earning. However, they provide the core services that the rest of the company kind of relies on. Some of the services could be the customer management/identity systems, billing systems. These systems are core to the business of the company; any of them failing could be detrimental to the product/company.\nWith that, that would mean these engineers tend to receive and handle components that are challenging to manage and handle. Such services would need to scale well, be resilent to failures, less impacted by feature requests that may be required to be created to win deals etc.\nNarrowing down what to learn # Let\u0026rsquo;s set a case where you are some sort of frontend engineer and you\u0026rsquo;re seeking to jump to backend engineering work (frontend got a bit too boring?). Or maybe if you are from another industry and you\u0026rsquo;re trying to jump to the software engineering track. When you get online and check the various youtube videos/blogs, you would realize that there are many things that you would need to learn in order to be able to make that jump. Should you just hunker down and start learning everything?\nThis is the main issue with the software engineering career nowadays. After topic you will try to research online is just a rabbit hole waiting for you to discover. Reading up on just a subtopic will make you realize that you have another gap in another part of your knowledge. If you keep following, there is a high chance you\u0026rsquo;ll just go around in circles; not full understanding everything online - and only being able to grab surface level knowledge which may not as useful to build up projects.\nThe following are kind of my personal suggestions to try in order to help narrow the amount of learning that you would need to do:\nReferring to job roles before studying # This method is metamorphically similar to looking at target and aiming for it. Instead of looking at the various technologies and attempting to understand the landscape, we would instead look at the role directly and try to understand what is involved in the role. From that, we can then try to understand what technologies would the job role require one to understand which we can then use to create the \u0026ldquo;study list\u0026rdquo; for us to utilize.\nLet\u0026rsquo;s take an example if we would want to apply for a job a Devops role. The role could mention that some of the job requirements could mention the following:\nFamiliarity with Docker, Kubernetes Familiarity with Jenkins, Groovy scripts Strong scripting with Python, Bash Familiar with cloud platforms such as AWS With the following job requirements, that would immediately reduce the scope of what we need to study. Instead of reading up on ways to deploying applications to Virtual Machines (systemd etc), we can focus on containerization technologies such as Docker. And since there is a mention of Python, Bash and Groovy scripts; that would define the languages or tools that we would need to master.\nThis can be applied to almost any tech job; as time goes by, your previous roles may overlap future roles and would make it even easier to search for newer roles.\nPersonal Projects # Just reading up on concepts and tools is usually insufficient to internalize how a tool/library/programming language works. In my opinion, the best way to do so is to actually to run the it on your own workstation or to try out it in various use cases.\nOne example that I have in my personal experience is in regards to understanding how Kubernetes works. If we have tried to read from online resources, it would just say that Kubernetes is a platform that orchestrates containers. If we had just read that and then just regiterate that to our interviews, it becomes clear that we don\u0026rsquo;t fully understand how the tool works or why an organization would want to use it.\nSo, if we want to understand Kubernetes, we would need to understand what a container is. And to understand what a container is, we may need to look into Docker containers and then from there we may look into how we create a docker container, how an app relates to it, how to get an app and database to work with the platform and what would be a good way to work with it.\nGetting the knowledge and understanding the complexity of the tool takes experience of using the tool and in my opinion, there is no better way then to have a end to end experience with it. This involves creating an application, putting it into a docker container and getting the docker container into the kubernetes cluster.\nWe can say the same for almost any piece of tech that we may wish to learn. Let\u0026rsquo;s say we feel that in the future, serverless platforms offered by cloud platforms would be a big deal with most companies using it in the future. Should we just stop at just reading up on the various offerings provided by the various platforms? Or should we go the extra mile to try deploying applications using the serverless platform? If we had gone with the latter, we would experience some of the issues with relying on such platforms (e.g. cold starts, dealing with dependencies, developing of app in a team based environment - no local environment)\nJust do the interview # As the title of this section says, just go ahead and proceed with the interview. The experience of going through with the interview also gleans a whole bunch of learnings although it involves hardening your heart against failing the interview and looking bad to your interviewers.\nSo why this approach? If you just go along with the above approaches, (e.g. approach 1 of using job posts and learning technologies based on job requirements), there is a very high likelihood that you would only cover surface level concepts of said tools. It could be surface level knowledge may not be enough for such roles; you might need to \u0026ldquo;go deeper\u0026rdquo; down the stack and have a appreaciation for the underlying technologies that power the tool. You will only glean such insight by going through interviews. At the same time, you may get the motivation to learn more from the feelings of regrets of being unprepared from the interview, thereby solidifying your knowledge and concepts further.\n","date":"5 January 2021","externalUrl":null,"permalink":"/charting-a-career-path-in-the-tech-world/","section":"Posts","summary":"DISCLAIMER: The following article is just an opinion. Naturally, each person have their own work experiences that they can use to project their future plans; so take the items in this article with a large spoonful of salt when applying it into your own perspective.\n","title":"Charting a career path in the tech world","type":"posts"},{"content":"There is actually plenty of work that needs to be done in order to continuously and consistently organize webinars in a meetup group. I am involved in one of them and it takes quite a bit of effort to maintain such effort to ensure that the group look \u0026ldquo;alive\u0026rdquo; with webinars being continuously churned out during this unique situation.\nBefore delving into the automation tool being built, it might be good to explore what needs to be done on a per webinar basis:\nCreate webinar event on streaming platform The group happens to be on Meetup.com. Naturally, the event needs to be added the platform to get people to know about the event Create the calendar invite for organizers and speakers. Mainly to get everyone\u0026rsquo;s time to synced up for the live webinar event A problem that often comes up is that the webinar details are usually quite vague until quite close to webinar date (details are only confirmed within the week - more for optimistic cases). In the case where we would want to ensure that the details of the events are in sync between the streaming platform and meetup.com, we would kind of waste a bit each time we would want to add new updates to the event.\nIt\u0026rsquo;s kind of a pain to do after a while so naturally, one or two of the platform will not be updated in time, thereby reducing the \u0026ldquo;marketing\u0026rdquo; of the webinar which kind of leads to less interest garnered for the webinar.\nRadical idea - Automating this thing # I didn\u0026rsquo;t exactly want to spend my free time updating various platform as planning for a webinar goes on. Rather than that, wouldn\u0026rsquo;t a tool that does syncing be nice to have here?\nAnd that\u0026rsquo;s what is being trialed right here:\nhttps://github.com/hairizuanbinnoorazman/techmeetup\nThe tool is built with heavy inspiration from how kubernetes does things. We have some files that we would use that would serve as our primary reference. Every hour, the tool would check against primary reference, then check against the \u0026ldquo;target platforms\u0026rdquo;, in this case, the streaming platforms as well as the meetup. We would do a GET request and check that whatever is on the platform coincides what is defined in the primary reference. Else, we would do an update of the details.\nThe tool kind of fulfils my goals:\nAllow me to pass the syncing job to the tool and let it handle, allowing me to achive a \u0026ldquo;always\u0026rdquo; updated description of webinar details Explore the various APIs and Google APIs and Auth and see how they can all work together The tool accepts a large configuration file where one passes all require application configuration - e.g. frequency of when to check and update the target platforms, a switch to turn off features of the tool where necessary (in the case the tool screws up something very very badly).\nThe tool eventually writes out the details of the events that it is managing into a file which can then be manipulated by the user. Ideally, we would want to touch this file as minimally as possible because this would be our \u0026ldquo;main\u0026rdquo; file which we would use to compare with the details on target platforms. Currently, it is a huge pain to \u0026ldquo;create\u0026rdquo; an event, there is plenty of copy and pasting that needs to be done. Hopefully, some functionality will be added to cover the this effort in the future.\nUnfortunately, the tool is still young (and with that, expect bugs) - so I wouldn\u0026rsquo;t expect it to handle the more \u0026ldquo;critical\u0026rdquo; webinars. But for the more regular meetups, it should generally be ok for the automation to handle it.\nAs a plus, the binary also includes functionality to retrive a list of links from a Google Slides presentation and present it in a yaml file. Afterwhich, the user of the binary can \u0026ldquo;replace\u0026rdquo; the links on the slides. This functionality is kind of built in order to do a bulk link shortener operation on the Google Slides operation (another kind of troublesome task that is done to ensure link are not a huge mess on screen)\nThe ROADMAP forward # I\u0026rsquo;ve got plenty of ideas then I want to stuff into the tool - that kind of allow me to focus on the actual thing that can bring benefit to the community - which is content. Some of the things I kind of want the tool to handle:\nInformation Consolidation. The community held a whole bunch of events and got a whole bunch of speakers down to bring in content. The scarce resource here is speakers and as much as possible we would hold onto speaker details in the hopes that if we need speakers for future events, we can invite them over for such events Utilize other social media platforms. It\u0026rsquo;s unfortunate that the meetup group I\u0026rsquo;m involved in is not in the various social media groups. There is a few reasons for this - it\u0026rsquo;s painful to keep using it; it\u0026rsquo;s easy for disused accounts to be hacked (I\u0026rsquo;m looking at you twitter). I will probably come up with more ideas as I continue to work on the tool.\n","date":"1 October 2020","externalUrl":null,"permalink":"/automating-the-admin-work-when-organizing-webinars-in-a-meetup-group/","section":"Posts","summary":"There is actually plenty of work that needs to be done in order to continuously and consistently organize webinars in a meetup group. I am involved in one of them and it takes quite a bit of effort to maintain such effort to ensure that the group look “alive” with webinars being continuously churned out during this unique situation.\n","title":"Automating the admin work when organizing webinars in a meetup group","type":"posts"},{"content":"One typical way to do packaging for applications that are to targeted to be deployed into a Kubernetes environment would be to utilize the helm tool. The helm tool has been used widely enough to the point that there are whole ecosystems that support the usage of this tool. Refer to the website for this here: https://hub.helm.sh/\nIn basic setup, a helm chart would be deploying following a specific order set in the codebase. E.g. namespace -\u0026gt; configmaps next etc. If we are to follow that, then, developers would have no control over how their applications to the cluster. In a more complex scenario; e.g. where maybe a job needs to be deployed first which would do database migration before actual deployment happens, we can\u0026rsquo;t just rely on the order that is set within helm codebase. We would probably need to have capability to control what resource is to be deployed to the cluster, when it would be deployed (e.g. like a pre-deployment step in order to get the data/machine to the state to accept the new version of the application). The mechanism for this is helm hooks. A good reference for this is to refer to the documentation that is already available on helm website: https://helm.sh/docs/topics/charts_hooks/\nConsidering/Thinking about helm behaviours # Let\u0026rsquo;s say we have a sample application that has a helm chart with the following snippets.\nIn /templates/configmap.yaml\napiVersion: v1 kind: ConfigMap metadata: name: basic-app-config data: game.properties: | lol: caca miao: zzz In /templates/deployment.yaml\napiVersion: apps/v1 kind: Deployment metadata: name: { { include \u0026#34;basic-app.fullname\u0026#34; . } } spec: template: spec: volumes: - name: foo configMap: name: basic-app-config items: - key: \u0026#34;game.properties\u0026#34; path: \u0026#34;game.yaml\u0026#34; containers: - name: { { .Chart.Name } } image: \u0026#34;{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}\u0026#34; imagePullPolicy: { { .Values.image.pullPolicy } } volumeMounts: - name: foo mountPath: \u0026#34;/etc/foo\u0026#34; readOnly: true // ... Shortened version presented here to focus on specific points - volume configuration Other resources that are to be managed by the chart is not presented here.\nLet\u0026rsquo;s say we get the following chart and deployed it to a cluster. And let\u0026rsquo;s say, eventually, we need the configmap to added as some sort of pre-hook (maybe we need to run a database migration before a application update)\nIn this case, we would add the configmap to the helm hooks to the configmap\nNew /templates/configmap.yaml\napiVersion: v1 kind: ConfigMap metadata: name: basic-app-config annotations: \u0026#34;helm.sh/hook\u0026#34;: pre-install,pre-upgrade \u0026#34;helm.sh/hook-weight\u0026#34;: \u0026#34;-5\u0026#34; \u0026#34;helm.sh/hook-delete-policy\u0026#34;: before-hook-creation data: game.properties: | lol: caca miao: zzz newconfig: aaa // New configuration line added In order to get the pod to get the new configmap - we would need to do some sort of change on deployment file - in this case, a typical way would be add sort of timestamp as sort of annotation to the deployment resource. With that the pod would be recreated and the configuration can be successfully loaded into the pod\nIf we do so, we\u0026rsquo;ll see the following:\nNAME READY STATUS RESTARTS AGE yahoo-basic-app-565d9dc8c4-22nfj 1/1 Running 0 32m yahoo-basic-app-565d9dc8c4-jmknb 1/1 Running 0 32m yahoo-basic-app-565d9dc8c4-kkkrv 1/1 Running 0 32m yahoo-basic-app-565d9dc8c4-kksp6 1/1 Running 0 32m yahoo-basic-app-bd6fcc946-mvs8w 0/1 ContainerCreating 0 31m yahoo-basic-app-bd6fcc946-q2r6c 0/1 ContainerCreating 0 31m yahoo-basic-app-bd6fcc946-x8rdq 0/1 ContainerCreating 0 31m The new pods are stuck in ContainerCreating phase. Trying to describe the problem shows the following:\nEvents: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 4m8s default-scheduler Successfully assigned default/yahoo-basic-app-bd6fcc946-mvs8w to gke-cluster-1-default-pool-41a5ca1d-5nh4 Warning FailedMount 2m5s kubelet, gke-cluster-1-default-pool-41a5ca1d-5nh4 Unable to mount volumes for pod \u0026#34;yahoo-basic-app-bd6fcc946-mvs8w_default(11a13ba5-0640-4766-96fb-7db3759a6cbc)\u0026#34;: timeout expired waiting for volumes to attach or mount for pod \u0026#34;default\u0026#34;/\u0026#34;yahoo-basic-app-bd6fcc946-mvs8w\u0026#34;. list of unmounted volumes=[foo]. list of unattached volumes=[foo yahoo-basic-app-token-dcvnm] Warning FailedMount 2m (x9 over 4m8s) kubelet, gke-cluster-1-default-pool-41a5ca1d-5nh4 MountVolume.SetUp failed for volume \u0026#34;foo\u0026#34; : configmap \u0026#34;basic-app-config\u0026#34; not found To sum it up, none of the logs from the following components would show anything significant/obvious:\nTiller component (if you used helm v2) Helm client tool (it won\u0026rsquo;t complain of any errors) Kubelet logs Docker logs A interesting thing to note is that now, the basic-app-config configmap is missing from the newly deployed. So what\u0026rsquo;s happening here?\nApparently, this boils to understanding that helm actual tracks and monitors resources that it is suppose to manage. Prehook resources are technically not managed/tracked as part of the main part of the release (in the case we skip hooks, the resource would not be deployed).\nSo in the case of the following, what probably happen was: (we set v1 as the initial version where configmpas has no helm annotations and v2 as the one with helm annotations)\nv1 was deployed where configmap is part of main helm release Configmap is designated to be deployed as part of pre-hook v2 was deployed New configmap created as part of prehook Helm does a diff between resources in v2 release and v1 release. It finds configmap is not meant to be there and it removes the configmap resource Pod finally scheduled to the Worker node Pod attempts to read and get configmap; however resource is deleted It would complain that it is unable to mount the configmap resource The chain of events are actually logical conclusions of what each tool/platform does. We do want additional resources that are removed from our helm chart to also be removed from the platform but as a side effect - this weird confusion happens when we randomly add helm annotations without any regard of the side effects it may cause.\nConclusion # If we come across such scenario - we can either purge the helm chart. Or redeploy it once more - the problem would kind of go away (on the second time of redeploy, the configmap would be left undeleted). This would be a weird kind of bug that one can easily attribute to flaky setup and probably rare for one to think and reason about.\nTo refer to a sample codebase that you can use to try this out:\nhttps://github.com/hairizuanbinnoorazman/Go_Programming/tree/master/Web/basicHelm\n","date":"11 September 2020","externalUrl":null,"permalink":"/tripping-over-helm-hooks/","section":"Posts","summary":"One typical way to do packaging for applications that are to targeted to be deployed into a Kubernetes environment would be to utilize the helm tool. The helm tool has been used widely enough to the point that there are whole ecosystems that support the usage of this tool. Refer to the website for this here: https://hub.helm.sh/\n","title":"Tripping over helm hooks","type":"posts"},{"content":"Loggers in codebases are generally code that is just taken for granted. We would usually imagine that we\u0026rsquo;ll just choose a logger library, import it and then just utilize in code. We would probably have the application pass some configuration to the application, maybe to reduce amount of logs printed in production to reduce the amount of load that it would produce in logging aggegration systems.\nUsually, this approach wouldn\u0026rsquo;t be a problem. However, what would happen if somehow or other, the logger library that we happened to pick for just happens to be incompatible with our logging aggegation systems? (Yes, fluentbit, beats etc can be configured to all kind of logging formats but it wouldn\u0026rsquo;t make sense to do it on a per component basis - might sense for the platform teams to dictate general logging formats that applications team need to conform to). With incompatible loggers, that would be forced to attempt to switch to logging systems that support it. Changing loggers in application code bases are generally the most painful thing to do - IMO, its almost akin to intellectual torture; a painful exercise.\nAnother reason to think of having some sort of logger interface is when you\u0026rsquo;re sharing your project\u0026rsquo;s packages with other projects. Let\u0026rsquo;s put an example where your code kind of utilizes a hard coded logger implementation within your project. And let\u0026rsquo;s say by default, the logger will print all statements, including info and debug statements. Without the interface (alternative can consider of accepting a logger function - but that would only allow you to pass 1 logger function), that would mean that the person calling your package have no control over what is being logged out. Just imagine where the compiled components would log out nicely formatted json logs and suddenly it switches to maybe multi-line logs (which your project\u0026rsquo;s package have decided to use). It\u0026rsquo;s a very jarring experience, making it hard to use said package properly.\nHowever, Golang does come with the interface construct. That would allow us to plugin differnt logging systems if we coded it out that way.\nLogger Interface # Let\u0026rsquo;s say we have a http handler.\ntype GetPage struct { logger logger.Logger pageDB page.Store } func (p GetPage) ServeHTTP(w http.ResponseWriter, r *http.Request) { p.logger.Info(\u0026#34;Start of GetPage handler\u0026#34;) defer p.logger.Info(\u0026#34;End of GetPage handler\u0026#34;) fmt.Fprintf(w, \u0026#34;Hello World: %s!\\n\u0026#34;, target) } Notice the logger.Logger that is declared as part of the GetPage struct. If the logger is a interface, it would then allow us to switch in different logger implementation depending on our use cases.\n// Part of logger package within project type Logger interface { Debug(args ...interface{}) Debugf(format string, args ...interface{}) Info(args ...interface{}) Infof(format string, args ...interface{}) Warning(args ...interface{}) Warningf(format string, args ...interface{}) Error(args ...interface{}) Errorf(format string, args ...interface{}) } The above is an example of a logger interface. With that, as long as the logger interface\nExtending to using test loggers # In the Visual Studio Code environment, you can run Golang unit tests quite easily. However, sometimes, code in some of these function get particularly complex - there may be too many state transitions in one variables after a whole bunch of functions is used to manipulate it. One way to kind of debug this is to maybe just comment out large sections of code just to be able to view what the current state of some variable which can be logged out in tests.\nJust for context, using your default logger and just logging it out don\u0026rsquo;t exactly seem to work as expected - the logs don\u0026rsquo;t exactly get printed out.\nLet\u0026rsquo;s say we have the following implementation:\ntype LoggerForTests struct { Tester *testing.T } func (l LoggerForTests) Debug(args ...interface{}) { l.Tester.Log(args...) } func (l LoggerForTests) Debugf(format string, args ...interface{}) { l.Tester.Logf(format, args...) } func (l LoggerForTests) Info(args ...interface{}) { l.Tester.Log(args...) } func (l LoggerForTests) Infof(format string, args ...interface{}) { l.Tester.Logf(format, args...) } func (l LoggerForTests) Warning(args ...interface{}) { l.Tester.Log(args...) } func (l LoggerForTests) Warningf(format string, args ...interface{}) { l.Tester.Logf(format, args...) } func (l LoggerForTests) Error(args ...interface{}) { l.Tester.Log(args...) } func (l LoggerForTests) Errorf(format string, args ...interface{}) { l.Tester.Logf(format, args...) } This is where having the logger interface that your struct/function accepts and use would allow the capability for people to use the following implementation that is mainly targeted for printing logs out during testing.\nJust additional thoughts # After reading and playing around with several golang codebases, I currently have the following opinion - if a technical decision is needed to be made, then, it\u0026rsquo;s best to utilize it as an interface so that alternative solutions can be used in the future.\nSome examples I can easily think of at the moment are datastores, loggers. Maybe in the future, if I discover more cases, then I\u0026rsquo;ll add to the list here.\nBut as with all things, take all advice with a grain of salt. Introducing interfaces this early into your codebase naturally increases the complexity of your code bases quite a bit. Sometimes, rather than having the interface, maybe the company decided that the place where implementations can be changed is on the network level (calling different endpoints etc) - which would mean that having all this complexity in the code bases would just make it plain old code bloat.\n","date":"16 August 2020","externalUrl":null,"permalink":"/implications-for-having-switchable-loggers/","section":"Posts","summary":"Loggers in codebases are generally code that is just taken for granted. We would usually imagine that we’ll just choose a logger library, import it and then just utilize in code. We would probably have the application pass some configuration to the application, maybe to reduce amount of logs printed in production to reduce the amount of load that it would produce in logging aggegration systems.\n","title":"Implications for having switchable loggers","type":"posts"},{"content":"NOTE: THIS POST IS NOTES IM TAKING FOR MYSELF WHILE ON THIS JOURNEY. TAKE IT WITH A BAG OF SALT. NOT ALL THINGS MENTIONED HERE IS TRUE - DO YOUR OWN DUE DILIGENCE\nThis topic is a really hard topic to wrap your head around. Generally, most people don\u0026rsquo;t need to dive this deep in order to understand how kubernetes work but let\u0026rsquo;s just say: I got a tad curious. I was itching to try to learn how to write a storage provisioner that utilizes CSI.\nLet\u0026rsquo;s start with the list of links/ideas that we need to grasp:\nIdeas to understand # Kubernetes overall architecture # User would communicate the needs of the applications via kubectl or other alternative tools to the Kubernetes API server. The API server would then schedule the required resources accordingly before proceeding to inform the kubelet (the binary that runs on nodes etc) of the new \u0026ldquo;state of the world\u0026rdquo; of the cluster. It then becomes the kubelet job to try to make it happen.\nIn the case for CSI, it would seem that kubelet would talk to the storage provisioner on the server. On cloud environments, that would mean that the storage provisioner would proceed to communicate with cloud apis to create virtual disk that would attach to the node which would then make the storage available to the container.\nCommunicating over sockets # Within csi spec, there are mentions where one would need to pass socket paths (e.g. tcp://\u0026hellip; or unix:///\u0026hellip;). The components talk over grpc and would need endpoints to communiate to.\nhttps://eli.thegreenplace.net/2019/unix-domain-sockets-in-go/\nThis seems to be a more effective way for processes to communicate with each other. As mentioned in the article, seeing that tcp do have overheads for communications that only send small messages to each other, it would make complete sense for communications to be done for unix sockets. Also, commmuncation would come from kubelet to the storage driver. Both binaries/processes are local =\u0026gt; hence, there is little need to ensure that the communication channels need to ensure that it can accept communication from outside the node.\nVolumes mounted in containers # Volumes in docker are mounted into the container via a linux system call: Mount\nhttps://github.com/moby/sys/blob/master/mount/mounter_linux.go#L30\nEven if storage systems are over nfs, there is code to handle such scenarios.\nA basic use of it would be here if one is to call it via CLI:\nhttps://linuxize.com/post/how-to-mount-and-unmount-file-systems-in-linux/\nMore complex scenarios when it comes to mounts:\nhttps://unix.stackexchange.com/questions/198542/what-happens-when-you-mount-over-an-existing-folder-with-contents\nHowever, if we read on further, kubernetes doesn\u0026rsquo;t exactly seem to use the mounting capabilities available by docker to mount the volumes. It has its own mechanism for mounting volumes into the container. That would explain how other runtimes which don\u0026rsquo;t have easy volume support as docker can be supported to run in k8s.\nhttps://kubernetes.io/docs/setup/production-environment/container-runtimes/\nGRPC # GRPC seems to be the most common way of how CSI components communicate with each other.\nTLDR version. It\u0026rsquo;s communicating binary on top of TCP\nReasons for doing it is obvious. Reduced overhead in terms of what gets put over the wire. It should also mean less resources being required to marshall and unmarshall the content. The information should immediately be useful for the component without needing resources to understand it.\nUnderstanding CSI Specification # And this would be hardest one to do among all of the tasks. A whole variety concepts will need to be understood before one can continue developing a storage privisioner with CSI sanely.\nThe main doc for this:\nhttps://kubernetes-csi.github.io/docs/developing.html\nThe specification itself is here:\nhttps://github.com/container-storage-interface/spec/blob/master/spec.md#rpc-interface\nHowever, the blog only cover higher ideas that don\u0026rsquo;t cover the details. In order to understand that, it would be good to go read sample code for a sample storage driver: hostpath-plugin\nhttps://github.com/kubernetes-csi/csi-driver-host-path\nThis is one of the better blog posts that describe end to end process of what happens when a CSI plugin is used.\nhttps://medium.com/velotio-perspectives/kubernetes-csi-in-action-explained-with-features-and-use-cases-4f966b910774\nThis driver dynamically creates volumes on host file system on kube nodes.\nThis url is for tool to help test CSI plugins:\nhttps://github.com/kubernetes-csi/csi-test/tree/master/pkg/sanity\nThis is a mock implementation of a CSI tool - it has no functionality but it contains application structure for a CSI plugin to work. Note on the endpoints needed in order to create one\nhttps://github.com/rexray/gocsi/tree/master/mock\nIt also comes with a CSI client in order to test calls that are to be made to the plugin:\nhttps://github.com/rexray/gocsi/tree/master/csc\nBlock storage on a file # This is mainly to understand how hostpath-plugin is able to support block storage support. Since Kubernetes 1.13, support for raw block storage came in. As mentioned in the article, such storage options is meant for more specialized workloads e.g. databases etc.\nhttps://kubernetes.io/blog/2019/03/07/raw-block-volume-support-to-beta/\nhttps://www.jamescoyle.net/how-to/2096-use-a-file-as-a-linux-block-device\nWithin the hostpath plugin, there are mentions where fallocate is used; it\u0026rsquo;s used when volume requested is of raw block storage type rather than mounted. Alternative approaches are dd and trucate but this seems to cover on why fallocate is being used instead.\nhttp://infotinks.com/dd-fallocate-truncate-making-big-files-quick/\n","date":"1 May 2020","externalUrl":null,"permalink":"/attempting-to-understand-csi-kubernetes/","section":"Posts","summary":"NOTE: THIS POST IS NOTES IM TAKING FOR MYSELF WHILE ON THIS JOURNEY. TAKE IT WITH A BAG OF SALT. NOT ALL THINGS MENTIONED HERE IS TRUE - DO YOUR OWN DUE DILIGENCE\n","title":"Attempting to understand CSI Kubernetes","type":"posts"},{"content":"There are various tooling out there that helps deal with screen recording etc. However, many of these tools/sites would somehow provide the recordings at a price. Maybe you can only record a certain number of hours of video per month? And the videos are non-downloadable (unless you pay for it) and it would expire after a set period.\nLuckily there are awesome tools out there that would provide such functionality\nFor recording of screens while doing demos, one can utilize OBS (Open Broadcaster Software) - https://obsproject.com/. Although the purpose in this case is pretty simple where we just want to record our screen while doing technical demos, however, the OBS tool can go way beyond that. You can easily take multiple streams of videos and audio and mush it into a single video or even stream that video straight into youtube etc.\nHowever, even with such a tool, it doesn\u0026rsquo;t provide capability to output gifs (maybe we would want to just put an image on the site rather than upload a whole video and manage its lifecycle and deal with issues that arise from hosting videos.)\nOne can utilize ffmpeg to do so.\nOut of convenience, one can utilize ffmpeg available in a docker image to do this work: see the command below\n# Windows environment docker run --rm -v c:/Users/USER/Videos:/temp jrottenberg/ffmpeg -i file:/temp/lol.mkv -r 15 file:/temp/lol.gif ","date":"15 April 2020","externalUrl":null,"permalink":"/recording-demos-with-free-tooling/","section":"Posts","summary":"There are various tooling out there that helps deal with screen recording etc. However, many of these tools/sites would somehow provide the recordings at a price. Maybe you can only record a certain number of hours of video per month? And the videos are non-downloadable (unless you pay for it) and it would expire after a set period.\n","title":"Recording demos with free tooling","type":"posts"},{"content":"While trying to understand how components that deal with Container Storage Interface (CSI) in Kubernetes, I came across mentions of how the components were using Unix domain sockets to communicate with each other. A quick read on why unix domain sockets seem to reveal that its use is to reduce the amount of overhead while such components talk to each locally. If the components had required to talk across to multiple nodes, it would have used TCP instead.\nThe following blog post is a good reference of using unix domain sockets for communication for golang.\nhttps://eli.thegreenplace.net/2019/unix-domain-sockets-in-go/\nRunning it on local machine # With reference from the following gist on github:\nhttps://gist.github.com/hakobe/6f70d69b8c5243117787fd488ae7fbf2\nWe can try to run the following on a local machine with bash available. (macos and linux). Save the following as main.go\npackage main import ( \u0026#34;io\u0026#34; \u0026#34;log\u0026#34; \u0026#34;net\u0026#34; \u0026#34;os\u0026#34; \u0026#34;os/signal\u0026#34; \u0026#34;syscall\u0026#34; ) // NOTE - CHANGE THE SOCKET FILE LOCATION ACCORDINGLY var SocketFile = \u0026#34;/tmp/go.sock\u0026#34; func echoServer(c net.Conn) { for { buf := make([]byte, 512) nr, err := c.Read(buf) if err != nil { if err == io.EOF { log.Println(\u0026#34;END OF FILE\u0026#34;) return } log.Println(\u0026#34;error in trying to read data\u0026#34;) return } data := buf[0:nr] println(\u0026#34;Server got:\u0026#34;, string(data)) _, err = c.Write(data) if err != nil { log.Fatal(\u0026#34;Writing client error: \u0026#34;, err) } } } func main() { log.Println(\u0026#34;Starting echo server\u0026#34;) ln, err := net.Listen(\u0026#34;unix\u0026#34;, SocketFile) if err != nil { log.Fatal(\u0026#34;Listen error: \u0026#34;, err) } sigc := make(chan os.Signal, 1) signal.Notify(sigc, os.Interrupt, syscall.SIGTERM) go func(ln net.Listener, c chan os.Signal) { sig := \u0026lt;-c log.Printf(\u0026#34;Caught signal %s: shutting down.\u0026#34;, sig) ln.Close() os.Exit(0) }(ln, sigc) for { fd, err := ln.Accept() if err != nil { log.Fatal(\u0026#34;Accept error: \u0026#34;, err) } go echoServer(fd) } } We can run the following by building the golang binary and running it or just running golang run main.go\nWe can try to run the following command and it would send \u0026ldquo;foo\u0026rdquo; to the application\necho -e \u0026#39;\\x66\\x6f\\x6f\u0026#39; | nc -U $(pwd)/tmp/go.sock Dockerized app that uses unix domain sockets # Let\u0026rsquo;s say we try to dockerize it.\nFROM golang:1.13 # This is so that internally, we can exec in and test it from inside RUN apt update \u0026amp;\u0026amp; apt install -y netcat-openbsd ADD . . RUN go build -o app ./main.go CMD [\u0026#34;/go/app\u0026#34;] We can then run the following command:\n# Make a tmp folder in current directory mkdir ./tmp # Build container docker build -t lol . # Run container - and mount volume into it # You should see the go.sock file created in the ./tmp folder that you created above docker run -v $(pwd)/tmp:/tmp lol With that, we would have created a running container that would run the application above. It would create a go.sock file within the tmp folder that you specified to mount to the container. However, if we were to try to use the run the command to communicate and send messages to the socket, it wouldn\u0026rsquo;t work:\necho -e \u0026#39;\\x66\\x6f\\x6f\u0026#39; | nc -U $(pwd)/tmp/go.sock Reason for this seems to be so:\nhttps://forums.docker.com/t/cant-connect-to-host-listening-unix-socket-from-container-vm/15526/2\nSockets made via linux containers can\u0026rsquo;t be used on macos systems to communicate to it. The only exception here would docker.sock and that is because efforts have been made to make it work.\nHowever, if you do so on linux based hostsystem, it would work fine. The messages would get sent across as expected.\nBut, if this is still to be tested on macos, we can do the following:\nRun the above built docker container - we would deem this the container2. That would give us another linux container to work with. We can then run the docker exec -it ... /bin/bash on the second container. Run docker logs ... to get the logs from container 1 Running the echo -e '\\x66\\x6f\\x6f' | nc -U /tmp/go.sock in container 2. We should see the logs coming out that mention that it received messages that contain foo for container 1. Applying it back to what is seen in K8s CSI components # The set of components that provide storage plugins to Kubernetes via CSI (namely the hostpath-plugin) has a statefulset where a volume is bound to it. The statefulset here has 3 containers within it. The socket file is mounted to the all 3 containers where they would all be communicating with each other. The volume can be read and modified by any of the 3 containers.\nRefer to the following yaml file:\nhttps://github.com/kubernetes-csi/csi-driver-host-path/blob/master/deploy/kubernetes-1.15/hostpath/csi-hostpath-plugin.yaml\n","date":"12 April 2020","externalUrl":null,"permalink":"/dockerizing-application-that-use-unix-sockets/","section":"Posts","summary":"While trying to understand how components that deal with Container Storage Interface (CSI) in Kubernetes, I came across mentions of how the components were using Unix domain sockets to communicate with each other. A quick read on why unix domain sockets seem to reveal that its use is to reduce the amount of overhead while such components talk to each locally. If the components had required to talk across to multiple nodes, it would have used TCP instead.\n","title":"Dockerizing application that use unix sockets","type":"posts"},{"content":"Let\u0026rsquo;s say we have a set of applications that was designed to be a set of microservices. Each of the applications would generally be designed to be focused on one specific domain and in order to achieve the overall goal of the platform. However,for the platform to work properly, the applications would generally need to work together as one which would involve the application contacting each other.\nHowever, having the applications being distributed in that manner makes it hard to understand the platform as a whole. It becomes difficult to analyze which application is the bottleneck when it comes to the response of some of the web application. Fortunately, there are various tooling nowadays that can help deal with this - one example of this is Jaeger.\nThe following example takes a set of application that have been deployed onto a Kubernetes platform and try to analyze the dependencies of the application and the breakdown of the timings for the application response. We would need to do the following:\nCreate an application that has opentracing libs and functions embedded in it Deploy an instance of Jaeger on the Kubernetes Have an application that reports to the Jaeger instance in order to allow us to analyze it After doing the following steps, we can probably see a dashboard like this:\nDeveloping application with tracing embedded in it # Although right now there are already efforts in order to attempt to merge open tracing and opencensus standards together to form open telemetry - it\u0026rsquo;s still going to take a while before the implementation for that comes out in the while. For now, let\u0026rsquo;s just look at how to understand applications behaviour by using opentracing libraries for it.\nThe full complete set of examples from current implementations for Golang applications can be viewed here: https://github.com/jaegertracing/jaeger-client-go/blob/master/config/example_test.go\nSo with that in mind, let\u0026rsquo;s try to create an example Golang application that allow us to do the following:\nUse Environment Variables to control response times of the application Use Environment Variables to control on whether the application should call downstream application Use Environment Variables to control what to reply to users that request a response from the service Use Environment Variables to control what the service name of the application that is to be shown on Jaeger Deploy application onto Kubernetes platform This would be the result of such requirements (the opentracing library is also introduced here as well)\nOut of convenience, we\u0026rsquo;ll be utilizing the Google Cloud Platform here as an example of a target Kubernetes cluster.\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;log\u0026#34; \u0026#34;net/http\u0026#34; \u0026#34;os\u0026#34; \u0026#34;strconv\u0026#34; \u0026#34;time\u0026#34; \u0026#34;github.com/opentracing/opentracing-go\u0026#34; \u0026#34;github.com/opentracing/opentracing-go/ext\u0026#34; \u0026#34;github.com/uber/jaeger-client-go\u0026#34; jaegercfg \u0026#34;github.com/uber/jaeger-client-go/config\u0026#34; jaegerlog \u0026#34;github.com/uber/jaeger-client-go/log\u0026#34; \u0026#34;github.com/uber/jaeger-lib/metrics\u0026#34; ) func handler(w http.ResponseWriter, r *http.Request) { tracer := opentracing.GlobalTracer() spanCtx, _ := tracer.Extract(opentracing.HTTPHeaders, opentracing.HTTPHeadersCarrier(r.Header)) serverSpan := tracer.StartSpan(\u0026#34;server\u0026#34;, ext.RPCServerOption(spanCtx)) defer serverSpan.Finish() log.Print(\u0026#34;Hello world received a request.\u0026#34;) defer log.Print(\u0026#34;End hello world request\u0026#34;) target := os.Getenv(\u0026#34;TARGET\u0026#34;) if target == \u0026#34;\u0026#34; { target = \u0026#34;NOT SPECIFIED\u0026#34; } waitTimeEnv := os.Getenv(\u0026#34;WAIT_TIME\u0026#34;) waitTime, _ := strconv.Atoi(waitTimeEnv) log.Printf(\u0026#34;Sleeping for %v\u0026#34;, waitTime) time.Sleep(time.Duration(waitTime) * time.Second) fmt.Fprintf(w, \u0026#34;Hello: %s!\\n\u0026#34;, target) clientURL := os.Getenv(\u0026#34;CLIENT_URL\u0026#34;) if clientURL != \u0026#34;\u0026#34; { url := clientURL req, _ := http.NewRequest(\u0026#34;GET\u0026#34;, url, nil) ext.SpanKindRPCClient.Set(serverSpan) ext.HTTPUrl.Set(serverSpan, url) ext.HTTPMethod.Set(serverSpan, \u0026#34;GET\u0026#34;) tracer.Inject(serverSpan.Context(), opentracing.HTTPHeaders, opentracing.HTTPHeadersCarrier(req.Header)) http.DefaultClient.Do(req) } } func main() { log.Print(\u0026#34;Hello world sample started.\u0026#34;) jaegerCollector := os.Getenv(\u0026#34;JAEGER_COLLECTOR\u0026#34;) // It just so happens then we would call our jaeger instance simplest. // And it would create a service endpoint called simplest-collector which we can sends our traces/spans to if jaegerCollector == \u0026#34;\u0026#34; { jaegerCollector = \u0026#34;http://simplest-collector:14268/api/traces\u0026#34; } cfg := jaegercfg.Configuration{ Sampler: \u0026amp;jaegercfg.SamplerConfig{ Type: jaeger.SamplerTypeConst, Param: 1, }, Reporter: \u0026amp;jaegercfg.ReporterConfig{ CollectorEndpoint: jaegerCollector, LogSpans: true, }, } jLogger := jaegerlog.StdLogger jMetricsFactory := metrics.NullFactory serviceName := os.Getenv(\u0026#34;SERVICE_NAME\u0026#34;) if serviceName == \u0026#34;\u0026#34; { serviceName = \u0026#34;NOT SPECIFIED\u0026#34; } // Initialize tracer with a logger and a metrics factory closer, _ := cfg.InitGlobalTracer( serviceName, jaegercfg.Logger(jLogger), jaegercfg.Metrics(jMetricsFactory), ) defer closer.Close() http.HandleFunc(\u0026#34;/\u0026#34;, handler) http.ListenAndServe(\u0026#34;:8080\u0026#34;, nil) } This would be the dockerfile for it\n# Example of golang module name: github.com/sampleusernameongithub/basicWithTracing # And with that example - name of binary that would be built: basicWithTracing FROM golang ADD . /go/src/\u0026lt;INSERT GOLANG MODULE NAME HERE\u0026gt; WORKDIR /go/src/\u0026lt;INSERT GOLANG MODULE NAME HERE\u0026gt; RUN go get RUN go install \u0026lt;INSERT GOLANG MODULE NAME HERE\u0026gt; ENTRYPOINT [\u0026#34;/go/bin/\u0026lt;NAME OF BINARY\u0026gt;\u0026#34;] EXPOSE 8080 And with that, we can run the following set of commands:\ndocker build -t gcr.io/\u0026lt;GCP PROJECT ID\u0026gt;/basicwithtracing:v1 . docker push gcr.io/\u0026lt;GCP PROJECT ID\u0026gt;/basicwithtracing:v1 This would have a container that we can pull into GKE for us to test\nDeploy Jaeger # There are various ways to deploy Jaeger instances. However, current more modern ways would be deploy a Jaeger operator - where we would have some sort of controller application deployed on a cluster which would would provide a \u0026ldquo;Jaeger\u0026rdquo; resource on our cluster. Any user can then request for a Jaeger resource.\nWe can deploy such a Jaeger operator via helm chart. (We would need to install helm first though)\nkubectl -n kube-system create serviceaccount tiller kubectl create clusterrolebinding tiller \\ --clusterrole cluster-admin \\ --serviceaccount=kube-system:tiller helm init --service-account tiller helm repo add jaegertracing https://jaegertracing.github.io/helm-charts helm install --name my-release jaegertracing/jaeger-operator We can then create the following Jaeger resource\n# Saved as jaeger.yaml apiVersion: jaegertracing.io/v1 kind: Jaeger metadata: name: simplest Run the command: kubectl apply -f jaeger.yaml\nAnd with that we should see the following if we run kubectl get pods\nNAME READY STATUS RESTARTS AGE my-release-jaeger-operator-6879c898c6-8lxvv 1/1 Running 0 1h simplest-569dc8589b-8xjjl 1/1 Running 0 1h The following should be deployed in the \u0026ldquo;default\u0026rdquo; namespace - unless your default namespace is not \u0026ldquo;default\u0026rdquo;\nAnd we would see this if we have run kubectl get svc\nNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.8.0.1 \u0026lt;none\u0026gt; 443/TCP 2h my-release-jaeger-operator-metrics ClusterIP 10.8.1.111 \u0026lt;none\u0026gt; 8383/TCP,8686/TCP 1h simplest-agent ClusterIP None \u0026lt;none\u0026gt; 5775/UDP,5778/TCP,6831/UDP,6832/UDP 1h simplest-collector ClusterIP 10.8.1.210 \u0026lt;none\u0026gt; 9411/TCP,14250/TCP,14267/TCP,14268/TCP 1h simplest-collector-headless ClusterIP None \u0026lt;none\u0026gt; 9411/TCP,14250/TCP,14267/TCP,14268/TCP 1h simplest-query ClusterIP 10.8.9.49 \u0026lt;none\u0026gt; 16686/TCP 1h Note the simplest-query as well as simplest-collector -\u0026gt; we would send our traces/spans to the simplest-collector and then we can view the results from those via simplest-query service endpoint.\nTesting the whole thing out # We can now have the following yaml that deploys 3 of the same apps - with different configurations. The different configurations would somewhat reveal the dependecies of the services like the graph above\napiVersion: extensions/v1beta1 kind: Deployment metadata: labels: run: app name: app namespace: default spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: labels: run: app spec: containers: - image: gcr.io/\u0026lt;GCP PROJECT ID\u0026gt;/basicwithtracing:v1 imagePullPolicy: Always name: app env: - name: WAIT_TIME value: \u0026#34;2\u0026#34; - name: TARGET value: \u0026#34;MIAO\u0026#34; - name: SERVICE_NAME value: app - name: CLIENT_URL value: \u0026#34;http://app2:8080\u0026#34; dnsPolicy: ClusterFirst restartPolicy: Always --- apiVersion: extensions/v1beta1 kind: Deployment metadata: labels: run: app2 name: app2 namespace: default spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: labels: run: app2 spec: containers: - image: gcr.io/\u0026lt;GCP PROJECT ID\u0026gt;/basicwithtracing:v1 imagePullPolicy: Always name: app2 env: - name: WAIT_TIME value: \u0026#34;1\u0026#34; - name: TARGET value: \u0026#34;MIAO\u0026#34; - name: SERVICE_NAME value: app2 - name: CLIENT_URL value: \u0026#34;http://app3:8080\u0026#34; dnsPolicy: ClusterFirst restartPolicy: Always --- apiVersion: v1 kind: Service metadata: labels: run: app2 name: app2 spec: ports: - port: 8080 protocol: TCP targetPort: 8080 selector: run: app2 type: ClusterIP --- apiVersion: extensions/v1beta1 kind: Deployment metadata: labels: run: app3 name: app3 namespace: default spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: labels: run: app3 spec: containers: - image: gcr.io/\u0026lt;GCP PROJECT ID\u0026gt;/basicwithtracing:v1 imagePullPolicy: Always name: app3 env: - name: WAIT_TIME value: \u0026#34;1\u0026#34; - name: TARGET value: \u0026#34;MIAO\u0026#34; - name: SERVICE_NAME value: app3 dnsPolicy: ClusterFirst restartPolicy: Always --- apiVersion: v1 kind: Service metadata: labels: run: app3 name: app3 spec: ports: - port: 8080 protocol: TCP targetPort: 8080 selector: run: app3 type: ClusterIP These would deploy 3 deployments - app, app2 and app3. app2 and app3 are exposed and allow other pods in the cluster to contact it via service names -\u0026gt; see the CLIENT_URL environment config in the deployment yaml file above.\nIn order to test and see such an analysis, we would need to do the following:\nGo into one of the app pods and ping the localhost:8080 resource Port forward the simplest-query service to localhost in order to view the dashboard # Run to get pod name to enter into kubectl get pods # Run this to get into the bash of one of app\u0026#39;s pods in order to run curl commands etc kubectl exec -it \u0026lt;pod name\u0026gt; /bin/bash # Run this within the pod in order to begin the cascading request and send responses to Jaeger curl localhost:8080 To view the dashboard, we would need to run the following:\nkubectl port-forward service/simplest-query 8088:16686 This forwards the 16686 port of simplest-query to the localhost computer port 8088. Going into localhost:8088 on local computer would allow us to see the Jaeger dashboard\n","date":"4 April 2020","externalUrl":null,"permalink":"/trying-distributed-tracing-with-jaeger/","section":"Posts","summary":"Let’s say we have a set of applications that was designed to be a set of microservices. Each of the applications would generally be designed to be focused on one specific domain and in order to achieve the overall goal of the platform. However,for the platform to work properly, the applications would generally need to work together as one which would involve the application contacting each other.\n","title":"Trying Distributed Tracing with Jaeger","type":"posts"},{"content":"This blog post is still being updated\nVarious cloud providers started offering serverless containers as a service. This is a service where developers can just create a container and then, pass that container over to the cloud provider and then forget about it. The cloud provider would deal with the scaling, provisioning of resources to host the applications, deployment, monitoring etc.\nSome of such services are:\nGoogle Cloud Run Pivotal Function Service Underneath these services lie various frameworks, some examples would be:\nKnative Openfaas These frameworks operate on top of various other tools, orchestrated together to work harmoniously (albeit, maybe a little fragile?) to provide the simplified developer experience of just focusing on delivering their application in a docker container and let the platform handle the rest.\nIn the case of this post, we would cover the way to deploy knative, which powers the Google Cloud Run product.\nDeploying a Kubernetes Cluster # We would want to try to deploy a Kubernetes cluster. There are various ways to do so these days:\nkubespray kops kubeadm Managed Kubernetes Clusters on Cloud Providers GKE on Google Cloud Platform AKS on Azure EKS on AWS (Managed Kubernetes platform) Digital Ocean Managed Kubernetes Platform Naturally, the easiest are the ones that are provided by Cloud Providers.\nIn our case, let\u0026rsquo;s say if we are to do it manually via kubeadm, we would first need to create 3 VMs on Google Cloud. We would then need to run the following commannds in sequence in order to get the kubernetes cluster up and running. The first part is to install the container runtime on the machines.\nIn order to support nodeport in the kubernetes cluster we would be creating, we would need to add a network tag to all of them. Network Tag: \u0026ldquo;nodeports\u0026rdquo;. Ports 30000-32767 needs to be made available for these.\nAt the same time, in order to provide external kubectl access from outside world to the cluster, we would need to create another network tag that opens the firewalls to port 6443 for these instances. We have the network tags be \u0026ldquo;kube-api\u0026rdquo; for this.\nAlso, since we are going to have the gce instance to contact the various google cloud platform to create the relevant volumes/load balancers, it is important that we state that the instance should have more permissions. For more granular control, you can follow the blog post stated above, but for simplicity sake, we would just set the instance to have full api access.\nSo to summarize, we need to to do following manual steps on google cloud console gui:\nCreate nodeports firewall rule Create kube-api firewall rule Create VM with firewall rule configured AND have all access to Google APIs Here are some additional references when trying to get a kubernetes cluster up in gce vms.\nReferences: https://medium.com/@stephane.beuret/kubeadm-on-gce-14df27d67bf5\n# Install Docker # Reference: https://docs.docker.com/install/linux/docker-ce/debian/ sudo apt-get update sudo apt-get install -y \\ apt-transport-https \\ ca-certificates \\ curl \\ gnupg2 \\ software-properties-common curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add - sudo apt-key fingerprint 0EBFCD88 sudo add-apt-repository \\ \u0026#34;deb [arch=amd64] https://download.docker.com/linux/debian \\ $(lsb_release -cs) \\ stable\u0026#34; sudo apt-get update sudo apt-get install -y docker-ce docker-ce-cli containerd.io The next step is to install kubeadm which would install the tool that would assist to install kubernetes binaries on the machines.\n# Installing kubeadm # Reference: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ # For debian 9 - can skip? sudo apt-get install -y iptables arptables ebtables sudo update-alternatives --set iptables /usr/sbin/iptables-legacy sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy sudo update-alternatives --set arptables /usr/sbin/arptables-legacy sudo update-alternatives --set ebtables /usr/sbin/ebtables-legacy sudo apt-get update \u0026amp;\u0026amp; sudo apt-get install -y apt-transport-https curl curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - cat \u0026lt;\u0026lt;EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list deb https://apt.kubernetes.io/ kubernetes-xenial main EOF sudo apt-get update sudo apt-get install -y kubelet kubeadm kubectl sudo apt-mark hold kubelet kubeadm kubectl # Not necessary for debian machines sudo systemctl daemon-reload sudo systemctl restart kubelet # Run the following command to prep it for weavenet CNI use sudo sysctl net.bridge.bridge-nf-call-iptables=1 Save this in the /etc/kubernetes/cloud-config on all 3 nodes\n[Global] project-id = \u0026#34;XXXX\u0026#34; node-tags = nodeports node-instance-prefix = \u0026#34;test\u0026#34; multizone = true From this point onwards, we would need save the files/run commands in particular machines. Let\u0026rsquo;s have the machines be called either master nodes or worker nodes.\nSave this as gce.yaml on the machine that is designated as the master node.\nNote: We need to create google compute engine instance group: test-group-manager =\u0026gt; load balancer can only be attached to instance groups\n# Configuration: https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2 apiVersion: kubeadm.k8s.io/v1beta2 kind: InitConfiguration bootstrapTokens: - groups: - system:bootstrappers:kubeadm:default-node-token token: 123456.test123456789012 ttl: 24h0m0s usages: - signing - authentication nodeRegistration: kubeletExtraArgs: cloud-provider: \u0026#34;gce\u0026#34; cloud-config: \u0026#34;/etc/kubernetes/cloud-config\u0026#34; taints: [] --- apiVersion: kubeadm.k8s.io/v1beta2 kind: ClusterConfiguration # networking: # podSubnet: \u0026#34;10.32.0.0/12\u0026#34; apiServer: certSANs: - X.X.X.X # Public IP Address of VM machine that is meant to be master - X.X.X.X # Private IP Addresss of VM machine that is meant to be master - 10.96.0.1 extraArgs: cloud-provider: \u0026#34;gce\u0026#34; cloud-config: \u0026#34;/etc/kubernetes/cloud-config\u0026#34; extraVolumes: - name: cloud hostPath: \u0026#34;/etc/kubernetes/cloud-config\u0026#34; mountPath: \u0026#34;/etc/kubernetes/cloud-config\u0026#34; controllerManager: extraArgs: cloud-provider: \u0026#34;gce\u0026#34; cloud-config: \u0026#34;/etc/kubernetes/cloud-config\u0026#34; extraVolumes: - name: cloud hostPath: \u0026#34;/etc/kubernetes/cloud-config\u0026#34; mountPath: \u0026#34;/etc/kubernetes/cloud-config\u0026#34; With that, we can now begin to try to initialize kubeadm to begin starting the required kubernetes services.\n# Run it on master node sudo su kubeadm init --config gce.yaml Add this to your worker node\napiVersion: kubeadm.k8s.io/v1beta2 kind: JoinConfiguration discovery: bootstrapToken: apiServerEndpoint: \u0026#34;X.X.X.X:6443\u0026#34; token: 123456.test123456789012 unsafeSkipCAVerification: true nodeRegistration: kubeletExtraArgs: cloud-provider: \u0026#34;gce\u0026#34; taints: [] And then run this command on each of your worker node in order to form the full cluster\nkubeadm join --config join.yaml The next step would be to install the networking overlay as well as to allow you to schedule pods on the master node.\nexport KUBECONFIG=/etc/kubernetes/admin.conf kubectl apply -f \u0026#34;https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d \u0026#39;\\n\u0026#39;)\u0026#34; # Flannel network (not fully tested) # kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml # This is not necessary in our case as we already remove taints from our deployment kubectl taint nodes --all node-role.kubernetes.io/master- Let\u0026rsquo;s try to deploy some apps to see if it works. This step is the most vital before proceeding on. If any of these \u0026ldquo;tests\u0026rdquo; fail, you cannot deploy istio nor knative.\n# Deploy a service and ensure it can connect to internet kubectl run --image=nginx --port=80 nginx kubectl exec -it nginx /bin/bash # Within container apt update # If this fails -\u0026gt; your networking requires a fixin\u0026#39; # Deploy a service with a load balancer kubectl run --image=nginx --port=80 nginx kubectl expose deployment nginx --type=LoadBalancer --name=nginx-service --port=80 --target-port=80 # Editing the load balancer to make the connections external annotations: networking.gke.io/load-balancer-type: External # https://github.com/kubernetes/legacy-cloud-providers/blob/8dfcb684d422483a0bc1ea84008859a5f7950b3a/gce/gce_loadbalancer.go#L218 # https://github.com/kubernetes/legacy-cloud-providers/blob/66bed784d14dbdc0d4a9ae192b1e137e9e295f30/gce/gce_annotations.go#L79 # Deploy a service with nodeport expose kubectl run nginx-nodesport --image=nginx --port=80 kubectl expose deployment nginx-nodesport --type=NodePort --name=nginx-nodeport --port=80 Getting private gcr.io docker images into the cluster # Not all apps that we need to run on the cluster would be available publicly. Let\u0026rsquo;s say if we have our private apps in our own private registry. How are we able to pull them into the cluster.\nGeneral rule of thumb for this issue is that if you can pull the images into the machine; you can deploy that into the kubernetes cluster\ngcloud auth configure-docker docker pull gcr.io/\u0026lt;PROJECT ID\u0026gt;/\u0026lt;IMAGE NAME\u0026gt; # Run image in kubernetes cluster kubectl run private-image --image=gcr.io/\u0026lt;PROJECT ID\u0026gt;/\u0026lt;IMAGE NAME\u0026gt; If you\u0026rsquo;re attempting to link load balancer to single node # # Hacks: # If you want to do a single node kubernetes \u0026#34;cluster\u0026#34; but still want load balancer # Reference: https://github.com/kubernetes/kubernetes/issues/65618 # Remove the following line: # node-role.kubernetes.io/master # To force it to say that this node can be used as a backend for a load balancer. Important Note: Calico don\u0026rsquo;t seem to work well here # https://github.com/kubernetes/kubeadm/issues/1776\nCalico doesn\u0026rsquo;t seem to work well here. So don\u0026rsquo;t use calico. Try using flannel instead\n# Don\u0026#39;t use calico here kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml Another important note: Old CNI config haunt current attempts to start cluster # In the case of trying to debug why coredns not starting\nhttps://github.com/coredns/deployment/issues/87\nUse some of the debugging steps here to find out why.\nOne possible reason is leftover effects of previous CNI left behind. We would need to remove it\n# Cleanup network plugins rm -r /etc/cni/net.d/ Installing Istio # Next step in the march to get knative working on the cluster would be to install some sort of service mesh (component to control the traffic of the application)\nWe\u0026rsquo;ll first go for a basic setup (with no sidecar injection before trying out the full blown istio components)\nThe first step is to actually install helm since helm is needed to install the istio component\nwget https://get.helm.sh/helm-v3.1.2-linux-amd64.tar.gz tar -xvzf helm-v3.1.2-linux-amd64.tar.gz cd linux-amd64 chmod +x helm mv helm /usr/local/bin/helm Then, we would need to run the following commands to get istio in:\n# https://knative.dev/v0.12-docs/install/installing-istio/ export ISTIO_VERSION=1.3.6 curl -L https://git.io/getLatestIstio | sh - cd istio-${ISTIO_VERSION} for i in install/kubernetes/helm/istio-init/files/crd*yaml; do kubectl apply -f $i; done cat \u0026lt;\u0026lt;EOF | kubectl apply -f - apiVersion: v1 kind: Namespace metadata: name: istio-system labels: istio-injection: disabled EOF The final step is to do finally apply the istio component and get all the istio components working\n# A lighter template, with just pilot/gateway. # Based on install/kubernetes/helm/istio/values-istio-minimal.yaml helm template --namespace=istio-system \\ --set prometheus.enabled=false \\ --set mixer.enabled=false \\ --set mixer.policy.enabled=false \\ --set mixer.telemetry.enabled=false \\ `# Pilot doesn\u0026#39;t need a sidecar.` \\ --set pilot.sidecar=false \\ --set pilot.resources.requests.memory=128Mi \\ `# Disable galley (and things requiring galley).` \\ --set galley.enabled=false \\ --set global.useMCP=false \\ `# Disable security / policy.` \\ --set security.enabled=false \\ --set global.disablePolicyChecks=true \\ `# Disable sidecar injection.` \\ --set sidecarInjectorWebhook.enabled=false \\ --set global.proxy.autoInject=disabled \\ --set global.omitSidecarInjectorConfigMap=true \\ --set gateways.istio-ingressgateway.autoscaleMin=1 \\ --set gateways.istio-ingressgateway.autoscaleMax=2 \\ `# Set pilot trace sampling to 100%` \\ --set pilot.traceSampling=100 \\ --set global.mtls.auto=false \\ install/kubernetes/helm/istio \\ \u0026gt; ./istio-lean.yaml kubectl apply -f istio-lean.yaml Installing Knative # And now, we finally come to knative, the final piece of the technology puzzle in order to unlock deployment serverless like workloads into our Kubernetes cluster.\nWe would be experimenting with several unique features of Knative:\nScale to zero on 0 traffic Traffic splitting between multiple versions of an application Accessing tag versions of an application Watch auto-scaled services as it handles load Knative is reliant on the previous set of technologies deployed above (although you have choices to switch out your \u0026ldquo;service mesh\u0026rdquo; layer).\nRefer to the following document for full instructions and details: https://knative.dev/v0.12-docs/install/knative-with-any-k8s/\n# Installing the knative CRDs kubectl apply --selector knative.dev/crd-install=true \\ --filename https://github.com/knative/serving/releases/download/v0.12.0/serving.yaml \\ --filename https://github.com/knative/serving/releases/download/v0.12.0/monitoring.yaml #--filename https://github.com/knative/eventing/releases/download/v0.12.0/eventing.yaml \\ # Getting the knative components to run kubectl apply --filename https://github.com/knative/serving/releases/download/v0.12.0/serving.yaml \\ --filename https://github.com/knative/serving/releases/download/v0.12.0/monitoring.yaml #--filename https://github.com/knative/eventing/releases/download/v0.12.0/eventing.yaml \\ kubectl get pods --namespace knative-serving kubectl get pods --namespace knative-eventing #kubectl get pods --namespace knative-monitoring Alter the DNS records for the config map in order to start knative serving to the right ip address. Reference: https://knative.dev/v0.12-docs/install/installing-istio/\n# Edit the following file kubectl edit cm config-domain --namespace knative-serving apiVersion: v1 kind: ConfigMap metadata: name: config-domain namespace: knative-serving data: # xip.io is a \u0026#34;magic\u0026#34; DNS provider, which resolves all DNS lookups for: # *.{ip}.xip.io to {ip}. =\u0026gt; We would need to use the istio-ingressgateway ip address X.X.X.X.xip.io: \u0026#34;\u0026#34; We can try to deploy an nginx container but we realize that it won\u0026rsquo;t work. Issues for that is added here.\n# https://github.com/knative/serving/issues/3809 # https://github.com/knative/serving/issues/2142 # https://medium.com/@frederic.lavigne/moving-a-cloud-foundry-app-to-knative-on-ibm-cloud-c0787e3611f1 apiVersion: serving.knative.dev/v1 # Current version of Knative kind: Service metadata: name: helloworld-go # The name of the app namespace: default # The namespace the app will use spec: template: spec: containers: - image: nginx Instead, we can try with the yaml below.\napiVersion: serving.knative.dev/v1 # Current version of Knative kind: Service metadata: name: helloworld-go-1 # The name of the app namespace: default # The namespace the app will use spec: template: spec: containers: - image: gcr.io/knative-samples/helloworld-go # The URL to the image of the app env: - name: TARGET # The environment variable printed out by the sample app value: \u0026#34;Go Sample v1\u0026#34; After multiple versions - we can try alter the above file to the following -\u0026gt; this would allow us to have traffic splitting between various versions of an application\n# With reference from the following page: https://github.com/knative/docs/blob/master/docs/serving/samples/traffic-splitting/split_sample.yaml apiVersion: serving.knative.dev/v1 # Current version of Knative kind: Service metadata: name: helloworld-go-1 # The name of the app namespace: default # The namespace the app will use spec: template: spec: containerConcurrency: 1 containers: - image: gcr.io/knative-samples/helloworld-go # The URL to the image of the app env: - name: TARGET # The environment variable printed out by the sample app value: \u0026#34;Go Sample v2\u0026#34; traffic: - tag: current revisionName: helloworld-go-1-xxxxx percent: 50 - tag: first revisionName: helloworld-go-1-xxxxx percent: 50 - tag: latest latestRevision: true percent: 0 Testing out scaling # Refer to the following url: https://github.com/sgotti/knative-docs/tree/master/serving/samples/helloworld-go\nWe would adjust the helloworld app by making it such that application would take a longer time to respond to requests. We would be adding code such that it would do a sleep before responding back to the request -\u0026gt; somewhat simulating the event where web requests are taking a while to complete.\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;log\u0026#34; \u0026#34;net/http\u0026#34; \u0026#34;os\u0026#34; \u0026#34;strconv\u0026#34; \u0026#34;time\u0026#34; ) func handler(w http.ResponseWriter, r *http.Request) { log.Print(\u0026#34;Hello world received a request.\u0026#34;) defer log.Print(\u0026#34;End hello world request\u0026#34;) target := os.Getenv(\u0026#34;TARGET\u0026#34;) if target == \u0026#34;\u0026#34; { target = \u0026#34;NOT SPECIFIED\u0026#34; } waitTimeEnv := os.Getenv(\u0026#34;WAIT_TIME\u0026#34;) waitTime, _ := strconv.Atoi(waitTimeEnv) log.Printf(\u0026#34;Sleeping for %v\u0026#34;, waitTime) time.Sleep(time.Duration(waitTime) * time.Second) fmt.Fprintf(w, \u0026#34;Hello World: %s!\\n\u0026#34;, target) } func main() { log.Print(\u0026#34;Hello world sample started.\u0026#34;) http.HandleFunc(\u0026#34;/\u0026#34;, handler) http.ListenAndServe(\u0026#34;:8080\u0026#34;, nil) } With the following app, we can provide it a WAIT_TIME environment variable that would allow us to control the amount of time for the app to return a response to the request. For completeness sake, the Dockerfile is also added here as well.\nFROM golang ADD . /go/src/github.com/knative/docs/helloworld RUN go install github.com/knative/docs/helloworld ENTRYPOINT /go/bin/helloworld EXPOSE 8080 We can proceed to build and push this a registry\ndocker build -t gcr.io/XXXX/helloworld:v1 docker push gcr.io/XXXX/helloworld:v1 We can then alter the knative configuration for this app in order to try scaling examples\napiVersion: serving.knative.dev/v1 # Current version of Knative kind: Service metadata: name: helloworld-go-1 # The name of the app namespace: default # The namespace the app will use spec: template: spec: containerConcurrency: 1 # Take note of this containers: - image: gcr.io/XXXX/helloworld env: - name: TARGET # The environment variable printed out by the sample app value: \u0026#34;Go Sample v2\u0026#34; - name: WAIT_TIME value: \u0026#34;2\u0026#34; This would create a pod that would respond to a web request in 2s. If one is loading that service with 3 requests/second, 1 pod won\u0026rsquo;t be sufficient to handle the requests, so knative autoscales the service out to handle the traffic.\nFor a more proper load testing tools, one can consider other tools like vegeta and apache benchmark\nNote: To view extreme cases where 20 requests/s come in at the same time etc, do ensure that the cluster has enough resources to handle it. If there is insufficient resources, the cluster may begin to starve critical components in order to fulfil and complete web requests.\nLogging and Monitoring in Knative # If you deploy the monitoring stack in knative, you would get both the grafana + prometheus as well as the ELK stack as well which would serve as the logging and monitoring platforms.\nFor the grafana dashboard, we can immediately view it by looking at services available, and then going to nodeport where the dashboard is exposed.\nTo get to view the kibana ui, we would need to first edit all the nodes in the cluster to enable the fluentd daemon to run on it\n# Add this line under the labels beta.kubernetes.io/fluentd-ds-ready: \u0026#34;true\u0026#34; # Verify the nodes that has this daemonset running kubectl get nodes --selector beta.kubernetes.io/fluentd-ds-ready=true kubectl get daemonset fluentd-ds --namespace knative-monitoring Then, get local kubectl access to the cluster.\nRun the following command:\nkubectl proxy # Then, go to the following link: http://localhost:8001/api/v1/namespaces/knative-monitoring/services/kibana-logging/proxy/app/kibana Tracing is also available via this link: http://localhost:8001/api/v1/namespaces/istio-system/services/zipkin:9411/proxy/zipkin/ -\u0026gt; make sure you run the kubectl proxy command first before accessing this.\nAdditional debugging steps # While creating this article, I experimented quite a bit. Tried installing latest istio and knative (didn\u0026rsquo;t work). Tried using Calico CNI (partly because that is the first CNI in the list on kubeadm page - this didn\u0026rsquo;t work as well)\nIf you\u0026rsquo;re here for the guide to successfully deploy knative on Google Compute Engine nodes - you don\u0026rsquo;t need to read this portion. But if you wish to explore and debug further issues, you can continue reading.\nAttempting to deploy latest knative on latest istio (as of March 2020) # Istio was at 1.5 and knative at 0.13\nIssue - custom domain job always in error. It was complaining that it was unable to reach the kubernetes api server.\nInitial investigation assumed that this was because of CNI issues\nhttps://github.com/kubernetes-sigs/metrics-server/issues/375 https://github.com/kubernetes/kubeadm/issues/1817 # The url being accessed https://10.96.0.1:443/api/v1/namespaces/knative-serving/configmaps/config-domain However, it is then noted that you can\u0026rsquo;t exactly ping the ip address of the kubernetes apiserver. Iptables rules have been setup to ignore such traffic. Also, if you run busybox and then run nslookup kubernetes -\u0026gt; the pods is able to resolve the addresses, but it is unable to reach it.\nAfter further researching, found out that the way to access such data is via the following:\nTOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) curl -H \u0026#34;Authorization: Bearer $TOKEN\u0026#34; --insecure https://10.96.0.1/api/v1/namespaces/knative-serving/configmaps/config-domain However, it is vital that the pod has the capability to query the apiserver regarding that. If you had not supplied the tokens, you are deemed as an anonymous user -\u0026gt; which automatically prevents you from pulling any data out of the pod.\nWithin the pod, it is possible to attach a service account to the pod (knative-serving already creating controller and default service accounts). The service accounts will somewhat indicate what api data can be pulled from apiserver.\nIn order to debug further, tried to create the following yaml files that would attempt to provide the required roles and capabiltiies to the service account so that it can pull the required data but still having issues with that.\napiVersion: rbac.authorization.k8s.io/v1beta1 kind: Role metadata: name: knative-role namespace: knative-serving rules: - apiGroups: - \u0026#34;\u0026#34; resources: - pods - secrets - configmaps verbs: - get - watch - list apiVersion: rbac.authorization.k8s.io/v1beta1 kind: RoleBinding metadata: name: knative-role-binding namespace: knative-serving roleRef: kind: Role name: knative-role apiGroup: rbac.authorization.k8s.io subjects: - kind: ServiceAccount name: controller namespace: knative-serving The conclusion from this is that further investigation need to be done to find out why that specific component is not fetching the config map as expected.\nLessons from a failed deploy # root@test-instance-1:~# kubeadm init --config gce.yaml W0320 09:43:56.874106 27922 validation.go:28] Cannot validate kube-proxy config - no validator is available W0320 09:43:56.874182 27922 validation.go:28] Cannot validate kubelet config - no validator is available [init] Using Kubernetes version: v1.17.4 [preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected \u0026#34;cgroupfs\u0026#34; as the Docker cgroup driver. The recommended driver is \u0026#34;systemd\u0026#34;. Please follow the guide at https://kubernetes.io/docs/setu p/cri/ [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using \u0026#39;kubeadm config images pull\u0026#39; [kubelet-start] Writing kubelet environment file with flags to file \u0026#34;/var/lib/kubelet/kubeadm-flags.env\u0026#34; [kubelet-start] Writing kubelet configuration to file \u0026#34;/var/lib/kubelet/config.yaml\u0026#34; [kubelet-start] Starting the kubelet [certs] Using certificateDir folder \u0026#34;/etc/kubernetes/pki\u0026#34; [certs] Generating \u0026#34;ca\u0026#34; certificate and key [certs] Generating \u0026#34;apiserver\u0026#34; certificate and key [certs] apiserver serving cert is signed for DNS names [test-instance-1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10 .128.0.13 10.128.0.13 35.194.27.204 10.96.0.1] [certs] Generating \u0026#34;apiserver-kubelet-client\u0026#34; certificate and key [certs] Generating \u0026#34;front-proxy-ca\u0026#34; certificate and key [certs] Generating \u0026#34;front-proxy-client\u0026#34; certificate and key [certs] Generating \u0026#34;etcd/ca\u0026#34; certificate and key [certs] Generating \u0026#34;etcd/server\u0026#34; certificate and key [certs] etcd/server serving cert is signed for DNS names [test-instance-1 localhost] and IPs [10.128.0.13 127.0.0.1 ::1] [certs] Generating \u0026#34;etcd/peer\u0026#34; certificate and key [certs] etcd/peer serving cert is signed for DNS names [test-instance-1 localhost] and IPs [10.128.0.13 127.0.0.1 ::1] [certs] Generating \u0026#34;etcd/healthcheck-client\u0026#34; certificate and key [certs] Generating \u0026#34;apiserver-etcd-client\u0026#34; certificate and key [certs] Generating \u0026#34;sa\u0026#34; key and public key [kubeconfig] Using kubeconfig folder \u0026#34;/etc/kubernetes\u0026#34; [kubeconfig] Writing \u0026#34;admin.conf\u0026#34; kubeconfig file [kubeconfig] Writing \u0026#34;kubelet.conf\u0026#34; kubeconfig file [kubeconfig] Writing \u0026#34;controller-manager.conf\u0026#34; kubeconfig file [kubeconfig] Writing \u0026#34;scheduler.conf\u0026#34; kubeconfig file [control-plane] Using manifest folder \u0026#34;/etc/kubernetes/manifests\u0026#34; [control-plane] Creating static Pod manifest for \u0026#34;kube-apiserver\u0026#34; [controlplane] Adding extra host path mount \u0026#34;cloud\u0026#34; to \u0026#34;kube-apiserver\u0026#34; [controlplane] Adding extra host path mount \u0026#34;cloud\u0026#34; to \u0026#34;kube-controller-manager\u0026#34; [control-plane] Creating static Pod manifest for \u0026#34;kube-controller-manager\u0026#34; [controlplane] Adding extra host path mount \u0026#34;cloud\u0026#34; to \u0026#34;kube-apiserver\u0026#34; [controlplane] Adding extra host path mount \u0026#34;cloud\u0026#34; to \u0026#34;kube-controller-manager\u0026#34; W0320 09:44:03.363012 27922 manifests.go:214] the default kube-apiserver authorization-mode is \u0026#34;Node,RBAC\u0026#34;; using \u0026#34;Node,RBAC\u0026#34; [control-plane] Creating static Pod manifest for \u0026#34;kube-scheduler\u0026#34; [controlplane] Adding extra host path mount \u0026#34;cloud\u0026#34; to \u0026#34;kube-apiserver\u0026#34; [controlplane] Adding extra host path mount \u0026#34;cloud\u0026#34; to \u0026#34;kube-controller-manager\u0026#34; W0320 09:44:03.364224 27922 manifests.go:214] the default kube-apiserver authorization-mode is \u0026#34;Node,RBAC\u0026#34;; using \u0026#34;Node,RBAC\u0026#34; [etcd] Creating static Pod manifest for local etcd in \u0026#34;/etc/kubernetes/manifests\u0026#34; [kubelet-check] It seems like the kubelet isn\u0026#39;t running or healthy. [kubelet-check] The HTTP call equal to \u0026#39;curl -sSL http://localhost:10248/healthz\u0026#39; failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn\u0026#39;t running or healthy. [kubelet-check] The HTTP call equal to \u0026#39;curl -sSL http://localhost:10248/healthz\u0026#39; failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn\u0026#39;t running or healthy. [kubelet-check] The HTTP call equal to \u0026#39;curl -sSL http://localhost:10248/healthz\u0026#39; failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn\u0026#39;t running or healthy. [kubelet-check] The HTTP call equal to \u0026#39;curl -sSL http://localhost:10248/healthz\u0026#39; failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused. One location to check during this failure is to check the logs of the kubelet\nMar 20 12:34:22 test-instance-1 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart. Mar 20 12:34:22 test-instance-1 systemd[1]: Stopped kubelet: The Kubernetes Node Agent. Mar 20 12:34:22 test-instance-1 systemd[1]: Started kubelet: The Kubernetes Node Agent. Mar 20 12:34:22 test-instance-1 kubelet[13216]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet\u0026#39;s --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information. Mar 20 12:34:22 test-instance-1 kubelet[13216]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet\u0026#39;s --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information. Mar 20 12:34:22 test-instance-1 kubelet[13216]: I0320 12:34:22.288202 13216 server.go:416] Version: v1.17.4 Mar 20 12:34:22 test-instance-1 kubelet[13216]: W0320 12:34:22.288527 13216 plugins.go:115] WARNING: gce built-in cloud provider is now deprecated. The GCE provider is deprecated and will be removed in a future release Mar 20 12:34:22 test-instance-1 kubelet[13216]: I0320 12:34:22.288699 13216 gce.go:265] Using GCE provider config \u0026amp;{Global:{TokenURL: TokenBody: ProjectID:healty-rarity-238313 NetworkProjectID: NetworkName: SubnetworkName: SecondaryRangeName: NodeTags:[nodeports] NodeInstancePrefix:test Regional:false Multizone:true APIEndpoint: ContainerAPIEndpoint: LocalZone: AlphaFeatures:[]}} Mar 20 12:34:22 test-instance-1 kubelet[13216]: I0320 12:34:22.293316 13216 gce.go:866] Using existing Token Source \u0026amp;oauth2.reuseTokenSource{new:google.computeSource{account:\u0026#34;\u0026#34;, scopes:[]string(nil)}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)} Mar 20 12:34:22 test-instance-1 kubelet[13216]: W0320 12:34:22.426969 13216 gce.go:475] Could not retrieve network \u0026#34;default\u0026#34;; err: googleapi: Error 404: The resource \u0026#39;projects/XXXX-238313\u0026#39; was not found, notFound Mar 20 12:34:22 test-instance-1 kubelet[13216]: F0320 12:34:22.465969 13216 server.go:273] failed to run Kubelet: could not init cloud provider \u0026#34;gce\u0026#34;: unexpected response listing zones: googleapi: Error 404: The resource \u0026#39;projects/projects/XXXX-238313\u0026#39; was not found, notFound Mar 20 12:34:22 test-instance-1 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a Mar 20 12:34:22 test-instance-1 systemd[1]: kubelet.service: Unit entered failed state. Mar 20 12:34:22 test-instance-1 systemd[1]: kubelet.service: Failed with result \u0026#39;exit-code\u0026#39;. ^C The logs would indicated reasons for failure, which in this case, is a type in project id -\u0026gt; Note the resource projects/XXXX-238313 line that mentioned that the project cannot be found.\n","date":"9 March 2020","externalUrl":null,"permalink":"/trying-knative-from-scratch/","section":"Posts","summary":"This blog post is still being updated\nVarious cloud providers started offering serverless containers as a service. This is a service where developers can just create a container and then, pass that container over to the cloud provider and then forget about it. The cloud provider would deal with the scaling, provisioning of resources to host the applications, deployment, monitoring etc.\n","title":"Trying Knative from scratch","type":"posts"},{"content":"These are some notes I took while experimenting and playing around with Golang further. This article is mainly exploring embedded structs and interfaces to experiment how they work etc.\nUse Golang playground in order to see how it works in action\nOne can combine golang interfaces together to form large interfaces.\npackage main import ( \u0026#34;fmt\u0026#34; ) type AAA interface { Hahax() } type BBB interface { Miao() } type CCC interface { AAA BBB } type ZZZ struct{} func (z ZZZ) Hahax() { fmt.Println(\u0026#34;ZZZ Hahax\u0026#34;) } func (z ZZZ) Miao() { fmt.Println(\u0026#34;ZZZ Miao\u0026#34;) } func Printer(c CCC) { c.Hahax() c.Miao() } func main() { z := ZZZ{} Printer(z) } Interface CCC is formed from interface AAA and BBB In order to fulfill requirements of CCC, struct ZZZ needs to implement both the Hahax and Miao functions. In order to understand where each function is being called from, we would have the function print out which struct it comes from and the name of its function. The following code outputs the following\nZZZ Hahax ZZZ Miao We can take apart struct ZZZ and compose it from multiple structs instead\npackage main import ( \u0026#34;fmt\u0026#34; ) type AAA interface { Hahax() } type BBB interface { Miao() } type CCC interface { AAA BBB } type DD1 struct{} func (z DD1) Hahax() { fmt.Println(\u0026#34;DD1 Hahax\u0026#34;) } type DD2 struct{} func (z DD2) Miao() { fmt.Println(\u0026#34;DD2 Miao\u0026#34;) } type ZZZ struct { DD1 DD2 } func Printer(c CCC) { c.Hahax() c.Miao() } func main() { z := ZZZ{} Printer(z) } Interface CCC is formed from interface AAA and BBB In order to fulfill requirements of CCC, struct ZZZ needs to implement both the Hahax and Miao functions. Struct ZZZ is composed of structs DD1 and DD2. DD1 implements the Hahax method while DD2 implements the Miao method Similarly, in order to understand where each function is being called from, we would have the function print out which struct it comes from and the name of its function. The following code would output the following:\nDD1 Hahax DD2 Miao However, let\u0026rsquo;s experiment further. What if ZZZ which is already composed DD1 and DD2 (which fulfills the requirements of CCC) also implements the Hahax function?\npackage main import ( \u0026#34;fmt\u0026#34; ) type AAA interface { Hahax() } type BBB interface { Miao() } type CCC interface { AAA BBB } type DD1 struct{} func (z DD1) Hahax() { fmt.Println(\u0026#34;DD1 Hahax\u0026#34;) } type DD2 struct{} func (z DD2) Miao() { fmt.Println(\u0026#34;DD2 Miao\u0026#34;) } type ZZZ struct { DD1 DD2 } func (z ZZZ) Hahax() { fmt.Println(\u0026#34;ZZZ Hahax\u0026#34;) } func Printer(c CCC) { c.Hahax() c.Miao() } func main() { z := ZZZ{} Printer(z) } The above code outputs the following instead.\nZZZ Hahax DD2 Miao Notice how as compared to previous situation, instead of printing out DD1 Hahax, it prints out ZZZ Hahax instead. So, there is some sort of level of hierarchy when functions are being called. If the list of function within that level has the Hahax function, it would call it; else, it would go down through the structs embedded within it and call it accordingly.\nNow, let\u0026rsquo;s experiment further. What if DD2 does not implement the Miao function but instead, it embeds DD3 which then implements the Miao function?\npackage main import ( \u0026#34;fmt\u0026#34; ) type AAA interface { Hahax() } type BBB interface { Miao() } type CCC interface { AAA BBB } type DD1 struct{} func (z DD1) Hahax() { fmt.Println(\u0026#34;DD1 Hahax\u0026#34;) } type DD2 struct { DD3 } type DD3 struct{} func (z DD3) Miao() { fmt.Println(\u0026#34;DD3 Miao\u0026#34;) } type ZZZ struct { DD1 DD2 } func (z ZZZ) Hahax() { fmt.Println(\u0026#34;ZZZ Hahax\u0026#34;) } func Printer(c CCC) { c.Hahax() c.Miao() } func main() { z := ZZZ{} Printer(z) } And as expected, the following code outputs the following:\nZZZ Hahax DD3 Miao Apparently, it would recurse down the embedded structs and use the first Miao function observed.\nIf somehow, DD1 also implements the Miao function, that it would it be expected that the DD3 Miao would not be printed but DD1 Miao would be printed instead.\npackage main import ( \u0026#34;fmt\u0026#34; ) type AAA interface { Hahax() } type BBB interface { Miao() } type CCC interface { AAA BBB } type DD1 struct{} func (z DD1) Hahax() { fmt.Println(\u0026#34;DD1 Hahax\u0026#34;) } func (z DD1) Miao() { fmt.Println(\u0026#34;DD1 Miao\u0026#34;) } type DD2 struct { DD3 } type DD3 struct{} func (z DD3) Miao() { fmt.Println(\u0026#34;DD3 Miao\u0026#34;) } type ZZZ struct { DD1 DD2 } func (z ZZZ) Hahax() { fmt.Println(\u0026#34;ZZZ Hahax\u0026#34;) } func Printer(c CCC) { c.Hahax() c.Miao() } func main() { z := ZZZ{} Printer(z) } The following is the output for this piece of code:\nZZZ Hahax DD1 Miao Let\u0026rsquo;s remove DD3 from the latest iteration of the code and also have DD2 also implement the Miao function. That would make it confusing - which Miao function should be used when since DD1 and DD2 embedded struct appear to be on the same level?\npackage main import ( \u0026#34;fmt\u0026#34; ) type AAA interface { Hahax() } type BBB interface { Miao() } type CCC interface { AAA BBB } type DD1 struct{} func (z DD1) Hahax() { fmt.Println(\u0026#34;DD1 Hahax\u0026#34;) } func (z DD1) Miao() { fmt.Println(\u0026#34;DD1 Miao\u0026#34;) } type DD2 struct {} func (z DD2) Miao() { fmt.Println(\u0026#34;DD2 Miao\u0026#34;) } type ZZZ struct { DD1 DD2 } func (z ZZZ) Hahax() { fmt.Println(\u0026#34;ZZZ Hahax\u0026#34;) } func Printer(c CCC) { c.Hahax() c.Miao() } func main() { z := ZZZ{} Printer(z) } We now have the following:\n./prog.go:52:9: ZZZ.Miao is ambiguous ./prog.go:52:9: cannot use z (type ZZZ) as type CCC in argument to Printer: ZZZ does not implement CCC (missing Miao method) Even the golang runtime becomes unsure of which one to run and it panics.\n","date":"10 January 2020","externalUrl":null,"permalink":"/golang-composition/","section":"Posts","summary":"These are some notes I took while experimenting and playing around with Golang further. This article is mainly exploring embedded structs and interfaces to experiment how they work etc.\nUse Golang playground in order to see how it works in action\n","title":"Golang composition","type":"posts"},{"content":"These are some notes that I took while taking my driving license - Class 3 in Singapore at CDC from August 2019 to December 2019. Some of the advice may/may not apply to you when learning to drive - so take all advice with a grain of salt.\nSome of the advice below may also apply to those from Class 3A but most likely can skip all the bits regarding clutch control. That would make certain stages way easier - especially slope\nReminders # List of important things to take note (Read them just before going for TP practical test)\nSignal then blindspot Always check blindspot before turning S course - full lock before exiting (Prevent wide turning point deducting) Reverse parking need delayed turning Be calm while switching lanes - don\u0026rsquo;t rush to next lane (Do proper rear view mirror checks and side mirror checks) Remember that reference points for reverse parking near crank course is different After emergency stop, do checks like moving off (Signal left and check blindspots etc) Turn at shoulder only for u turn and going out of reverse parking Be more left at zebra crossing (To prevent motorcylist from squeezing on your left) At traffic junction, always check pedestrian, then traffic then blindspot Drive slower in circuit so that got more time to turn Gear 2 for turns (on road) In circuit - don\u0026rsquo;t inch out at traffic light\u0026hellip; later might get stuck at yellow box Accelerator should be on the ball of the feet Look out to where you want to go (thru the side mirrors etc) Bus stop yellow arrow is a yellow box Long filter lane - check 3 times. 1 check before entering filtee lane, 1 check when turning, last check befoee going out Turn right from reverse parking with crank course - towards parallel parking - move slightly left then when mirror cross solid white line - full lock right Moving off # Signal Make sure in gear 1 Check mirror and blindspot Check blindspot Release handbrake Slope # Go up and stop before yellow line - need to accelerate a bit Signal to the kerb Handbrake on Then prepare to move off Biting point Accel to 2500 Signal opposite side Release handbrake slowly (but not too slow) Put more accel and release more clutch accordingly Cancel signal on top Stop if got other vehicle in front at stop line Let go clutch a little and let vehicle roll Full on clutch on the way down with no brake Brake before the end and clutch in\u0026hellip; hard brake is fine Kerb side stopping # Signal left Check blindspot Inch towards stopping point Keep checking left side mirrors Once stopped\u0026hellip; hazard light Put to neutral Handbrake on Crank course # Full lock of steering wheel at the door lock Once vehicle is straightened turn 2 rounds back No need to check blindspot in the course S course # You should not be at the centre of the road at the end Be more on the right of the road at the end - else rear left wheel might hit kerb when turning out Check blindspot before turning out Direction change # Go into it Go as far in as possibly can Stop Put to reverse Check rear view Turn and check back Check spot is clear Reverse till required mark Check left blind spot before turning steering wheel Ensure parallel to kerb inside If too left, move out to the right before turning left out. If you do not this, your rear left wheel will go over the kerb If too right, its ok Perpendicular parking (near s course) # Be closer to the line Move till next parking at kerb at shoulder Begin to turn when black bar in rear window touches right rear corner of parking spot Perpendicular parking (near crank course) # Be center of lane Move till next kerb at shoulder Extend the yellow bar to the ends to become a lot Right bar of rear left window to cross it and then full lock Perpendicular parking (common) # Check rear view Check back mirror Check spot is clear Reverse a little Check swing out Check mirrors Reverse till ok Parallel parking # Kerb side stop - don\u0026rsquo;t be too close Left signal Stop when side mirror at stop line or sign to next bay Put to reverse Check rear view Check back window Move till front left door handle reach edge of kerb Check right blindspot Full lock left Look at right side mirror, check for middle for white and black kerb If 3/4 white or between white and black kerb, quickly make wheels straight Once at mid between kerb, stop vehicle and adjust mirrors CHECK REAR VIEW AND REAR WINDOW BEFORE CONTINUING continue till rear wheel crosses yellow line full lock right move as far as you possibly can switch to 1st gear straighten vehicle/wheel put to neutral and handbrake on await for approval from tester/instructor when going out\u0026hellip; DO IT SLOWLY. USE CLUTCH ON-OFF TECHNIQUE Parallel parking corrections # If rear wheel not in, move forward as much as you can Full lock left, move as far back as you need and straighten vehicle Experiences during test # Car being used in practical driving lesson is vastly different from the cars used in TP. Cars used in TP seem to generally have more effective brake/clutch system. Judge the quality of brakes/clutch/accelerator accordingly - that is why the pre-practise round before actual TP is very important - it is to get used to the car After practise round (will be accompanied by random instructor - not much advice/tips will be given - only a quick test of circuit + external road) - you will stop vehicle infront of main lobby of CDC driving school and will be asked to get off. Instructor will be parking the car (there will be instructions on where you will find your car provided) When tester and you are in the car, you will need to start the vehicle, do safety checks and then leave the parking lot (don\u0026rsquo;t forget to turn out of parking lot by reference of shoulder) - check that traffic is clear before leaving parking lot. ","date":"20 December 2019","externalUrl":null,"permalink":"/tips-for-class-3-license-in-singapore/","section":"Posts","summary":"These are some notes that I took while taking my driving license - Class 3 in Singapore at CDC from August 2019 to December 2019. Some of the advice may/may not apply to you when learning to drive - so take all advice with a grain of salt.\n","title":"Tips for Class 3 license in Singapore","type":"posts"},{"content":"","date":"18 October 2019","externalUrl":null,"permalink":"/categories/microsoft/","section":"Article Categories","summary":"","title":"Microsoft","type":"categories"},{"content":"","date":"18 October 2019","externalUrl":null,"permalink":"/tags/microsoft/","section":"Technology Tags","summary":"","title":"Microsoft","type":"tags"},{"content":"I\u0026rsquo;ve recently needed to find a way to use the Graph APIs offered by Microsoft in order to receive data and send data to the various Microsoft services. However, the documentation for it is pretty much scattered with various \u0026ldquo;depreciated\u0026rdquo; versions of the documentation everywhere. And the more weird thing is that there is emphasis to utilize the SDKs rather than calling the APIs directly. (I mean, its true that SDK makes it way easier to try it out by encapsulating API calls to be just function calls but sometimes, its kind of hassle to try to go understanding another library again.). It\u0026rsquo;s really quite a pain to go find some relevant documentation on this.\nIf you\u0026rsquo;re new to Oauth2 and if you find that you need to make calls to the API in some very specific way that the SDKs do not exactly cover, then, it would be best to just go look at Google\u0026rsquo;s Oauth2 authentication. They are precise, easier to find and understand and they actually document the approach if you are going to do it via rest. Else, approaching this would require quite a bit of experiment work just to make it work. (Expect to see a lot of error 400 and not understanding which portion is the reason why its not working as expect)\nI\u0026rsquo;ve dumped the most simplest version of a flask app that talks to Microsoft Graph API here if you need it for some reference.\nAdd the following code to app.py\nimport json import logging import requests from flask import Flask, request, redirect app = Flask(__name__) with open(\u0026#34;config.json\u0026#34;, \u0026#39;r\u0026#39;) as raw_data: config_data = json.load(raw_data) @app.route(\u0026#39;/\u0026#39;) def redirected(): code = request.args.get(\u0026#34;code\u0026#34;) if code is not None: resp = requests.post(\u0026#34;https://login.microsoftonline.com/{}/oauth2/v2.0/token\u0026#34;.format(config_data[\u0026#34;tenant_id\u0026#34;]), data={ \u0026#34;client_id\u0026#34;: config_data[\u0026#34;client_id\u0026#34;], \u0026#34;scope\u0026#34;: \u0026#34;https://graph.microsoft.com/User.Read\u0026#34;, \u0026#34;code\u0026#34;: code, \u0026#34;redirect_uri\u0026#34;: \u0026#34;http://localhost:8000\u0026#34;, \u0026#34;grant_type\u0026#34;: \u0026#34;authorization_code\u0026#34;, \u0026#34;client_secret\u0026#34;: config_data[\u0026#34;client_secret\u0026#34;] }) logging.warning(code) logging.warning(resp) logging.warning(resp.content) return str(resp.content) + \u0026#34;\\n\u0026#34; + code else: return \u0026#39;Hello, World!\u0026#39; @app.route(\u0026#39;/login\u0026#39;) def login(): return redirect(\u0026#34;https://login.microsoftonline.com/{}/oauth2/v2.0/authorize?client_id={}\u0026amp;response_type=code\u0026amp;redirect_uri=http%3A%2F%2Flocalhost:8000\u0026amp;response_mode=query\u0026amp;scope=openid%20offline_access%20https%3A%2F%2Fgraph.microsoft.com%2Fuser.read\u0026amp;state=12345\u0026#34;.format(config_data[\u0026#34;tenant_id\u0026#34;], config_data[\u0026#34;client_id\u0026#34;])) @app.route(\u0026#39;/hahax\u0026#39;) def final(): return \u0026#34;HAHAX ENDED\u0026#34; if __name__ == \u0026#39;__main__\u0026#39;: app.run() Don\u0026rsquo;t forget to add this config.json file. Values are not added here for obvious reasons. Get your own.\n{ \u0026#34;tenant_id\u0026#34;: \u0026#34;FIND THIS VALUE ON AZURE PORTAL\u0026#34;, \u0026#34;client_id\u0026#34;: \u0026#34;FIND THIS VALUE ON AZURE PORTAL\u0026#34;, \u0026#34;client_secret\u0026#34;: \u0026#34;FIND THIS VALUE ON AZURE PORTAL\u0026#34; } Before you run this, make sure to go to Azure portal (even if you don\u0026rsquo;t use Azure, you would still need to go there to activate the APIs and create the auth profiles for your account)\nAzure Portal (https://portal.azure.com) -\u0026gt; Azure Active Directory -\u0026gt; App Registration Create an application on App Registration Click on the newly created app to manage it. Near the top of the panel, there would be an endpoints button -\u0026gt; this would help you get the authentication and token endpoints that you would need in order to do Oauth2 logins for application. You would also get your client id here Go to Authentication tab in order to add in your redirect uris that you would need to authenticate Go to Certificates \u0026amp; Secrets tab in order to get the client secret that you would need to authenticate Note, instructions that relate to UI are generally vague and imprecise, partly due to UIs generally way too often, making it hard to document them down reliably. In future posts, if there is ever a easy way to do it via CLI, then I would add it here.\nAfter than, fill a config.json and run the python app.\nThis is a gist of what\u0026rsquo;s happening:\nUser goes to /login endpoint of your server. This would redirect you to login your Microsoft account. Expect the usual microsoft interface here. You can imagine this to be similar to the User logins to their account. Microsoft would redirect you using the redirect uri that you have specified in the Azure portal and in the request You receive the code from the redirect from microsoft, combine it with another post request to exchange it for access_token and refresh_token etc. There is a lifetime to how long the token lasts to provide access on the user\u0026rsquo;s behalf to the various Microsoft Graph APIs Run the python code above with this:\nFLASK_RUN_PORT=8000 flask run The final response would be a html page that would show the json response containing access_token that you need to access the Graph APIs. Copy it add try it out with a curl command. I assume that you have at least added the necessary scopes on Azure portal to allow your app to query yourself to try things out. Refer to the scope in the root server call to see what scopes to add on Azure Portal\ncurl -H \u0026#34;Authorization: Bearer ACCESS-TOKEN-XXXXX\u0026#34; https://graph.microsoft.com/v1.0/me With that, you would get a json response providing information about yourself as a user on the Microsoft account.\n","date":"18 October 2019","externalUrl":null,"permalink":"/microsoft-graph-api-authentication/","section":"Posts","summary":"I’ve recently needed to find a way to use the Graph APIs offered by Microsoft in order to receive data and send data to the various Microsoft services. However, the documentation for it is pretty much scattered with various “depreciated” versions of the documentation everywhere. And the more weird thing is that there is emphasis to utilize the SDKs rather than calling the APIs directly. (I mean, its true that SDK makes it way easier to try it out by encapsulating API calls to be just function calls but sometimes, its kind of hassle to try to go understanding another library again.). It’s really quite a pain to go find some relevant documentation on this.\n","title":"Microsoft Graph API Authentication","type":"posts"},{"content":"While working on a couple of projects that would be deployed on Google Cloud Run, I realized that a couple of them tend to have some sort of similar structure. Due to the number of repositories I would typically handle on a personal basis as well as the amount of context switch I would need to move between projects; it would ideal that all of such projects are automated as much as possible.\nThis is a small list of features that would great to be templatized across projects:\nCloud Build templates Include the handling of secrets via Google KMS if necessary. Handle cases between http based vs msg based Cloud Run deployments (they have slight differences which can easily trip you while you are rushing out a project) Handle little issues when dealing with Cloud Build; Previously, I found out that Cloud Build would generate a .gcloudignore file from a .gitignore if it doesn\u0026rsquo;t exist. Let\u0026rsquo;s say you were to deploy it to Google Cloud Functions and you added *.json to .gitignore. That would mean that *.json would be added to .gcloudignore causing all json file to ignored (even though the files could have been decoded/generated during cloud build steps) Convenience commands Make commands to test locally Make commands to fire pubsub messages Make commands to alter topics/subscriptions Able to templatize conveniently without resorting to using git hacks etc or libs. (Previous methods involved relying on git providing to set one of the projects that you own as a template. Gitlab used to be able to allow to do this but that suddenly became a paid feature - a painful lesson indeed) From above, it seems that creating a template would be nice. And in order to aid with this, there is a tool called cookiecutter. Here is a link to the project: https://github.com/cookiecutter/cookiecutter\nThis is a template that can be generated via cookiecutter: https://github.com/hairizuanbinnoorazman/cookiecutter-cloud-run-go\nTo use the tool, one can run the following after installing cookiecutter on your computer:\ncookiecutter https://github.com/hairizuanbinnoorazman/cookiecutter-cloud-run-go It is a prompt based cli tool; it would provide options that you can alter accordingly. This is an example of what it would like when you run it now:\ncookiecutter https://github.com/hairizuanbinnoorazman/cookiecutter-cloud-run-go You\u0026#39;ve downloaded /Users/hairizuannoorazman/.cookiecutters/cookiecutter-cloud-run-go before. Is it okay to delete and re-download it? [yes]: no Do you want to re-use the existing version? [yes]: yes golang_mod_name [github.com/sample/sample]: mod_name [sample]: app_name [sample]: Select type: 1 - http 2 - msg Choose from 1, 2 [1]: There might be more options in the future as more features would be added to this template to support more complex cases.\n","date":"5 October 2019","externalUrl":null,"permalink":"/cookiecutter-template-for-google-cloud-run/","section":"Posts","summary":"While working on a couple of projects that would be deployed on Google Cloud Run, I realized that a couple of them tend to have some sort of similar structure. Due to the number of repositories I would typically handle on a personal basis as well as the amount of context switch I would need to move between projects; it would ideal that all of such projects are automated as much as possible.\n","title":"Cookiecutter template for Google Cloud Run","type":"posts"},{"content":"A classic move to reduce the attack surface of Google Cloud Instances is follow the advice below:\nIf service on instance don\u0026rsquo;t need Public IPs, don\u0026rsquo;t attach Public IPs to such instances If instance requires Public IPs, ensure that only specific ports that are required are exposed. Clamp down on the rest of the ports and ensure no ingress on them With these basic principles, it would be simple to think how these would eventually lead to an architecture where users access the instances via a bastion host. A bastion host is a instance that would allow user to ssh in from the \u0026ldquo;outside\u0026rdquo; world. The more critical instances would linked together in a private network that is unaccessible from the outside (except for load balancers to receive traffic etc).\nHere are some of the better explained articles on the topic:\nhttps://cloud.google.com/solutions/connecting-securely#bastion https://en.wikipedia.org/wiki/Bastion_host\nHowever, if we setup the architecture this way, how can we ssh into private instances from the outside world? It would be unwise to first ssh into the bastion host and then have our private keys there so that we can ssh further. Doing that wouldn\u0026rsquo;t make sense; it wouldn\u0026rsquo;t increase security but instead, just made it worst.\nSo, one of the better ways to do this is to actually use a configuration called ProxyCommand that is part of the ssh utility.\nLet\u0026rsquo;s take an example. Let\u0026rsquo;s say we have 2 instances:\nInstance 1: Public IP: 70.70.70.70 Private IP: 10.0.0.1 Instance 2: Private IP: 10.0.0.2 In order to ssh in Instance 2 from the outside world (e.g. my own local computer), I can run the command as follows:\nssh -o ProxyCommand=\u0026#34;ssh -W %h:%p 70.70.70.70\u0026#34; 10.0.0.2 With the command, we are ssh-ing into the Instance 2 by jumping through Instance 1. (So if Instance 1 goes down, our ssh session would end as well)\nBut rather than typing the above command over and over again, we might as well set the folllowing in the ssh configuration file (~/.ssh/config)\nHost Bastion HostName 70.70.70.70 Port 22 User AdminUser IdentityFile ~/.ssh/id_rsa Host AppServer HostName 10.0.0.2 Port 22 User AdminUser IdentityFile ~/.ssh/id_rsa ProxyCommand ssh -W %h:%p Bastion So, if you type:\nssh AppServer It would be get you into the server without too much effort from your end to remember what params to add to the ssh command\n","date":"1 August 2019","externalUrl":null,"permalink":"/ssh-configurations-for-going-into-google-cloud-instances/","section":"Posts","summary":"A classic move to reduce the attack surface of Google Cloud Instances is follow the advice below:\nIf service on instance don’t need Public IPs, don’t attach Public IPs to such instances If instance requires Public IPs, ensure that only specific ports that are required are exposed. Clamp down on the rest of the ports and ensure no ingress on them With these basic principles, it would be simple to think how these would eventually lead to an architecture where users access the instances via a bastion host. A bastion host is a instance that would allow user to ssh in from the “outside” world. The more critical instances would linked together in a private network that is unaccessible from the outside (except for load balancers to receive traffic etc).\n","title":"SSH configurations for going into Google Cloud Instances","type":"posts"},{"content":"There are various serverless compute solutions on the Google Cloud Platfrom; initially it used to be only Appengine and Google Cloud Function. Google Appengine is a solution that allow you to focus on writing up apps and allow Google to take of deployment/scaling/operations. Google Cloud Functions take a step further and allow you as a developer to develop just plain old functions and allow Google to handle the rest of it, thereby making it easier to split your app functionality to parts that require to scale and parts that don\u0026rsquo;t need to.\nEach of the above have their advantages and disadvantages. Advantages was ease of use and get started with an app deployed to production really really quickly. However, for both products, they introduce platform lock-in as well as the need for users to wait for platform support. The products are only released for few languages (they get updated as time goes on). Let\u0026rsquo;s take Google Cloud Functions; at the moment of writing, one can can use node.js, golang and python. Java support should be coming in soon. But these are the only assortment of languages available. If you use something exotic something like erlang or R or even C++.\nDuring Google Cloud Next 2019, Google announced a release of Google Cloud Run which essentially is a product that allows you to write http based products in Docker containers. These docker containers can be passed to Google which they manage and run it. The containers can deployed from 0 instances to 1000 instances with memory requirements of the containers being settings that you can optionally set to handle application requirements.\nDeploying a R API to Google Cloud Run # In order to demonstrate this, we can try to create a http application that is based using R. R has a library that can handle http based workloads called plumber. With it, you can build a web based application that can receive GET and POST requests which can then be further handled within the application.\nThe below code is for app.R which describes the main R application that would handle the logic of a web based R application\n#* Echo back the input #* @param msg The message to echo #* @get /echo function(msg=\u0026#34;\u0026#34;){ list(msg = paste0(\u0026#34;The message is: \u0026#39;\u0026#34;, msg, \u0026#34;\u0026#39;\u0026#34;)) } #* Plot a histogram #* @png #* @get /plot function(){ rand \u0026lt;- rnorm(100) hist(rand) } #* Return the sum of two numbers #* @param a The first number to add #* @param b The second number to add #* @post /sum function(a, b){ as.numeric(a) + as.numeric(b) } A R file to handle dependency management in dep.R\ninstall.packages(\u0026#34;plumber\u0026#34;) A R file to handle starting the R application in start.R\nlibrary(plumber) r \u0026lt;- plumb(\u0026#34;app.R\u0026#34;) r$run(port=8080, host=\u0026#34;0.0.0.0\u0026#34;) A Dockerfile to package the R API into a docker container\nFROM r-base ADD dep.R . RUN Rscript dep.R ADD app.R start.R ./ EXPOSE 8080 CMD Rscript start.R With all that, we can then run the following commands:\ndocker build -t gcr.io/{google-project-name}/R-Api:0.0.1 . docker push gcr.io/{google-project-name}/R-Api:0.0.1 Google Cloud Run currently seems to only be able to deploy from the Google Cloud Registry at the moment, so we would need to use that for now.\nWe can deploy the Google Cloud Run service to run with this:\ngcloud beta run deploy R-Api --allow-unauthenticated --concurrency=1 --image=gcr.io/{google-project-name}/R-Api:0.0.1 --memory=512Mi A few comments before continuing:\nR is a single threaded language. When we build http applications with the Plumber R library, it shouldn\u0026rsquo;t be able to handle parallel web requests coming in at the same time. Another process manager/application is actually needed to handle this. However, to reduce complexity, we can deploy the R application with concurrency as 1 - this would mean each container can only handle 1 application at any point in time. Cloud Run, similar to Google Cloud Function do face the cold start issue. If the container is big and if the application take a while to start, then the initial application latency will be higher With that, we have deployed an application onto Google Cloud Run. Some issues you could face would probably be enabling the APIs for the above services to even begin using but that should be relatively easily resolved.\nBelow are some references on this:\nR-API codebase: https://github.com/hairizuanbinnoorazman/api-with-R Creating Async workloads on Google Pubsub: https://cloud.google.com/run/docs/tutorials/pubsub Some slides on some points to highlight when deploying R-API docker container to Google Cloud Run: https://docs.google.com/presentation/d/1M8EhARDBY33IefEz356NhdUkkSyUZo1tHZBkMt-NtpE ","date":"15 April 2019","externalUrl":null,"permalink":"/introduction-to-google-cloud-run/","section":"Posts","summary":"There are various serverless compute solutions on the Google Cloud Platfrom; initially it used to be only Appengine and Google Cloud Function. Google Appengine is a solution that allow you to focus on writing up apps and allow Google to take of deployment/scaling/operations. Google Cloud Functions take a step further and allow you as a developer to develop just plain old functions and allow Google to handle the rest of it, thereby making it easier to split your app functionality to parts that require to scale and parts that don’t need to.\n","title":"Introduction to Google Cloud Run","type":"posts"},{"content":"So recently, I\u0026rsquo;ve been needing to automate my builds for my few Golang projects via Google Cloud Build. However, rather than building docker containers, I needed Golang binaries instead, which kind of meant that I would need to have the CI/CD pipeline have a Go environment/runtime to build them. However, when it comes to these CI/CD solutions, including private Golang packages/modules in siad projects is usually quite troublesome. Private Golang packages usually take the code from private Github/Bitbucket/Gitlab repos and getting the go get command to fetch them successful require a bit of hacks here and there to make it work successfully.\nLet\u0026rsquo;s go over an example of how to get this done:\nCreating a private golang package # We can have a private golang module that consist of this. This repo needs to be a private repo in any of the public git repository systems e.g. github, gitlab or bitbucket. In my case, I was trying with gitlab; didn\u0026rsquo;t try with the other git providers.\nNote: This example uses go modules, so I believe you would need at least go1.11 and above\nLet\u0026rsquo;s call this file: fakecars.go\npackage fakecars import ( \u0026#34;fmt\u0026#34; \u0026#34;time\u0026#34; ) // FakeCar represents a vehicle that can be used to modify cars type FakeCar struct { RegistrationNum string Wheel int Country string Date time.Time } // NewFakeCar creates a new vehicle. One needs to provide a registration number for use func NewFakeCar(registerNum string) (FakeCar, error) { return FakeCar{ RegistrationNum: registerNum, Wheel: 4, }, nil } // Valid checks whether the vehicle is a valid vehicle that can be used func (f *FakeCar) Valid() error { if f.Wheel \u0026lt; 2 { return fmt.Errorf(\u0026#34;No vehicle can have less have than 2 wheels\u0026#34;) } return nil } This file is generated when we call go mod init gitlab.com/hairizuanbinnoorazman/fakecars This file is called go.mod\nmodule gitlab.com/hairizuanbinnoorazman/fakecars With all that, we would have created a golang private package called fakecars\nConsuming it in a project # We can create the \u0026ldquo;main\u0026rdquo; project in another repo. This project would call the private project and its function within its code base. This example project would be called fakegarage\nLet\u0026rsquo;s call this main.go\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;gitlab.com/hairizuanbinnoorazman/fakecars\u0026#34; ) func main() { fmt.Println(\u0026#34;Yea!!\u0026#34;) hehe, _ := fakecars.NewFakeCar(\u0026#34;kjanjcnajkcn\u0026#34;) fmt.Printf(\u0026#34;%+v\u0026#34;, hehe) } If this is also going to be a private project, then this too will require go mod init XXX\nmodule gitlab.com/hairizuanbinnoorazman/fakegarage The next step is to actually run go get ... commands in order to retrieve the modules needed to build this project. You can\u0026rsquo;t just run go get gitlab.com/hairizuanbinnoorazman just like that because the tool will actually stop at attempting to authenticate. The go get tool\u0026rsquo;s default is set such that the terminal prompts are disabled. One way to do so is to run with the following command:\nenv GIT_TERMINAL_PROMPT=1 go get gitlab.com/hairizuanbinnoorazman/fakecars For some odd reason, this works/doesn\u0026rsquo;t work consistently. With this command, it would prompt you to key in your username and password multiple times and end up failing for the first time. However, if one tries again, it would suddenly work fine; I\u0026rsquo;m not too sure why but then again, this post is not meant to explore why this happens. This would be for linux environments.\nOn OSX machines, the credential helper would come in to help with authenticating to private git repository, so on the initial set of the environment variable above, the package would installed without requiring to keep keying in the username and password values over and over again.\nThe more \u0026lsquo;official\u0026rsquo; stance from many other blog posts/guides out there is to actually do the following instead:\nGenerate a ssh key with the command: ssh-keygen -o -t rsa -b 4096 -C \u0026quot;XXX@gmail.com\u0026quot; Add the ssh key as deploy keys to the fakegarage and fakecars project Run the following command: git config --global --add url.\u0026quot;git@gitlab.com:\u0026quot;.insteadOf \u0026quot;https://gitlab.com/\u0026quot;. This would result in the private packages being called via ssh rather than over https. That would allow you to skip user authentication entirely. At the end of this process, we would see our go.mod file for the fakegarage project turn to something like this:\nmodule gitlab.com/hairizuanbinnoorazman/fakegarage require gitlab.com/hairizuanbinnoorazman/fakecars v0.0.0-20190224070000-fffffffffff A go.sum would also be generated to lock the versions of the packages being used in the project\nNow with all this set up, we should be able to run a go build . command safely. The command should be able to run the build and compile a binary, and we should be able to run the binary with little issues.\nPrepping it for CI/CD in Google Cloud Build # On CI/CD platforms like Google Cloud Build, one doesn\u0026rsquo;t expect and require interactivity. You would expect to just push code into git repository. After doing so, the build system should build and compile the solution accordingly.\nThis would mean the method of setting env GIT_TERMINAL_PROMPT=1 won\u0026rsquo;t be good for the workflow. We need to go with the official stance of handling go private modules which uses ssh to fetch the packages instead. That would also mean that we somehow need to add ssh keys to the build pipeline. Doing so might not be so safe, so we would ideally use another service to encyrpt the keys accordingly.\nCommand line to encrypt. One would need to set up a keyring test and a key test1 to do this encryption.\ngcloud kms encrypt \\ --key test1 \\ --keyring test \\ --location global \\ --plaintext-file id_rsa \\ --ciphertext-file id_rsa.enc With that, we can then properly test a workflow that creates the automated golang build pipeline.\nsteps: - name: \u0026#34;gcr.io/cloud-builders/gcloud\u0026#34; args: - kms - decrypt - --ciphertext-file=id_rsa.enc - --plaintext-file=/root/.ssh/id_rsa - --location=global - --keyring=test - --key=test1 volumes: - name: \u0026#34;ssh\u0026#34; path: /root/.ssh - name: \u0026#34;golang:1.11.4\u0026#34; entrypoint: \u0026#34;bash\u0026#34; args: - \u0026#34;-c\u0026#34; - | ssh-keyscan gitlab.com \u0026gt; /root/.ssh/known_hosts git config --global --add url.\u0026#34;git@gitlab.com:\u0026#34;.insteadOf \u0026#34;https://gitlab.com/\u0026#34; chmod 0600 /root/.ssh/id_rsa go build -o main-test . volumes: - name: \u0026#34;ssh\u0026#34; path: /root/.ssh artifacts: objects: location: \u0026#39;gs://testing-golang-builds/\u0026#39; paths: [\u0026#39;main-test\u0026#39;] With that code, you should have set up the full workflow. There are plenty of fixed values used here, so one would replace it with variables that can be injected in order to fit the use case.\nReferences # Here are some examples for creating this example\nhttps://github.com/golang/go/issues/26134 https://cloud.google.com/cloud-build/docs/quickstart-go https://cloud.google.com/cloud-build/docs/configuring-builds/store-images-artifacts ","date":"1 March 2019","externalUrl":null,"permalink":"/private-go-modules-in-google-cloud-build/","section":"Posts","summary":"So recently, I’ve been needing to automate my builds for my few Golang projects via Google Cloud Build. However, rather than building docker containers, I needed Golang binaries instead, which kind of meant that I would need to have the CI/CD pipeline have a Go environment/runtime to build them. However, when it comes to these CI/CD solutions, including private Golang packages/modules in siad projects is usually quite troublesome. Private Golang packages usually take the code from private Github/Bitbucket/Gitlab repos and getting the go get command to fetch them successful require a bit of hacks here and there to make it work successfully.\n","title":"Private Go Modules in Google Cloud Build","type":"posts"},{"content":"As one writes several python applications to be targeted on the Google Cloud Functions platform, it becomes increasingly obvious to pull out the more common bits of code out into its own library. Let\u0026rsquo;s have an example on the reason for this.\nLet\u0026rsquo;s say you have a small function integrates with Slack APIs. It takes in json blobs and manipulate such blobs before forwarding it towards Slack. When you do your first integration with Slack with another service, it seems pretty simple and straightforward; just refer to the json being used to that service. However, after doing the integration for the fifth time, it points to the need for some sort of common code that can be used to build up the structure of json blob to be sent to the service. We need some sort of client package to do this.\nThere are a few benefits for having a client package; the consumers of said services does not need to look to deep of what inputs that are used to sent over. They can just import the client library and begin to use said service with relative ease as compared to the alternatives of requiring to build the clients.\nWays to have client packages # In python, there are several ways to import packages. The most common way is to have the import packages from the public python repositories but that would only be for public packages. If one wants to have a private python packages, alternatives are to put it on pypi packages server (private), utilize private git repositories (you can install python packages from a git repository without building the python package) or hosting the python package on your pypi-server setup.\nFor this post, it\u0026rsquo;ll explore on how to set up pyivate python package hosting with your own pypi-server setup\nBuilding the Sample Package # Refer to the following git repository:\nhttps://github.com/hairizuanbinnoorazman/local-pypi-server/tree/master/sample\nWith reference to packages such as requests, we can copy some code structures from said packages to create our sample python package. The sample package here only has one function: sample_print_stmt. It takes a string input and prints it out as well as returns it.\nThe only folder that matters here is the sample folder. The sample.egg-info as well as dist folders are generated while building the python package. To build the package the package, we would run the following command:\npython setup.py sdist bdist_wheel Building pypi-server docker image # There is a python package that provides the capability to have the pypi server. It is availble on this repo: https://github.com/pypiserver/pypiserver. Within this repo, we can see that it also provides the Dockerfile and Docker images that would contain the pypi-server codebase to serve python packages. We can then build our required Docker image based on that.\nFROM pypiserver/pypiserver ADD ./sample/dist /data/packages ADD ./.htpasswd / ENTRYPOINT [\u0026#34;pypi-server\u0026#34;, \u0026#34;-p\u0026#34;, \u0026#34;8080\u0026#34;, \u0026#34;-P\u0026#34;, \u0026#34;/.htpasswd\u0026#34;, \u0026#34;-a\u0026#34;, \u0026#34;update,download\u0026#34;, \u0026#34;./packages\u0026#34;] The python packages would be served from a specified location as seen in the entrypoint section of the dockerfile. After running the build command to build the python packages, we can just add the built zipped python packages to the right directory.\nIn order to \u0026ldquo;protect\u0026rdquo; our python package repository, we would create a htpasswd file that would require consumers and uploaders the need to provide a username and password to the service. With the -a flag, we can set it such that it would require usernames and password when a update or download is happening.\nWe can build the container and run it accordingly.\ndocker build -t pyserver . docker run -p 8000:8080 pyserver With the above docker commands, we now have a local pypi-server serving python packages on port 8000.\nUsing the sample package from private pypi-server # To try installing it, we can then run the following command: (I assume that you would know how to create your own virtual python environment)\npip install --index=http://localhost:8000/simple sample The sample package would be installed with that. We can then try to import said package and use the function.\nimport sample sample.sample_print_stmt(\u0026#34;caacc\u0026#34;) It works even with Pipenv. The only thing you would need to do before installing it is to add the following source after the original pypi source as an alternative source that the pip tool can use to find python package.\n[[source]] url = \u0026#34;http://localhost:8000/simple\u0026#34; verify_ssl = false name = \u0026#34;sample\u0026#34; After this step, you can just run the following to install the sample package. It should not have any issues from installing or even locking it into the requirements.\npipenv install sample Additional thoughts # As I was sharing this with other engineers, a few mention about how using pip and pipenv tools, you can potentially just install the package directly from the git repository by itself. This is possible and can be done for both public and private python git repositories.\nHowever, after doing a few quick tests on this, I kind of realized that installing python packages via this way will lead to me losing intellisense for the entire package when I am coding it in Visual Studio Code. (Not sure about other editors, didn\u0026rsquo;t exactly try them). For a small package, the lost of intellisense while coding might be ok but if this was a big package; this is gonna be a huge drawback. I\u0026rsquo;m currently heavily reliant on this intellisense systems and if its not able to point the next possible direction which I can take my code in, it\u0026rsquo;ll just hamper my progress significantly.\nAnother minus point for installing python packages directly from git repositories is that they don\u0026rsquo;t seem to install package sub-dependencies alongside the targeted package. Take an example pandas. The pandas python library is heavily dependent on a numpy python package. However, if you try to install directly via git repositories, this error would come up:\nObtaining pandas from git+https://github.com/pandas-dev/pandas#egg=pandas Cloning https://github.com/pandas-dev/pandas to /Users/XXX/.local/share/virtualenvs/url-checker-VRYs4T2O/src/pandas Complete output from command python setup.py egg_info: Traceback (most recent call last): File \u0026#34;/Users/XXX/.local/share/virtualenvs/url-checker-VRYs4T2O/lib/python3.7/site-packages/pkg_resources/__init__.py\u0026#34;, line 357, in get_provider module = sys.modules[moduleOrReq] KeyError: \u0026#39;numpy\u0026#39; During handling of the above exception, another exception occurred: Traceback (most recent call last): File \u0026#34;\u0026lt;string\u0026gt;\u0026#34;, line 1, in \u0026lt;module\u0026gt; File \u0026#34;/Users/XXX/.local/share/virtualenvs/url-checker-VRYs4T2O/src/pandas/setup.py\u0026#34;, line 737, in \u0026lt;module\u0026gt; ext_modules=maybe_cythonize(extensions, compiler_directives=directives), File \u0026#34;/Users/XXX/.local/share/virtualenvs/url-checker-VRYs4T2O/src/pandas/setup.py\u0026#34;, line 480, in maybe_cythonize numpy_incl = pkg_resources.resource_filename(\u0026#39;numpy\u0026#39;, \u0026#39;core/include\u0026#39;) File \u0026#34;/Users/XXX/.local/share/virtualenvs/url-checker-VRYs4T2O/lib/python3.7/site-packages/pkg_resources/__init__.py\u0026#34;, line 1142, in resource_filename return get_provider(package_or_requirement).get_resource_filename( File \u0026#34;/Users/XXX/.local/share/virtualenvs/url-checker-VRYs4T2O/lib/python3.7/site-packages/pkg_resources/__init__.py\u0026#34;, line 359, in get_provider __import__(moduleOrReq) ModuleNotFoundError: No module named \u0026#39;numpy\u0026#39; ---------------------------------------- Command \u0026#34;python setup.py egg_info\u0026#34; failed with error code 1 in /Users/XXX/.local/share/virtualenvs/url-checker-VRYs4T2O/src/pandas/ I\u0026rsquo;m guessing that during python packaging, it does some work to inform the pip tool regarding the dependencies of the package as well.\n","date":"1 February 2019","externalUrl":null,"permalink":"/setting-up-a-private-pypi-server/","section":"Posts","summary":"As one writes several python applications to be targeted on the Google Cloud Functions platform, it becomes increasingly obvious to pull out the more common bits of code out into its own library. Let’s have an example on the reason for this.\n","title":"Setting up a Private Pypi Server","type":"posts"},{"content":"Recently, Google has been launching a couple of certification programs that would help people dictate their knowledge and expertise levels with the Google Cloud Platform. At the moment (January 2019), there are 7 certifications including Professional Cloud Architect, Cloud Developer and Professional Data Engineer.\nSo with all that happening, I decided to try to go for the Cloud Developer and see how being certified is like. I added all the resources that I used and relied on before going for the exam so you can refer and use them to prepare for the exam. I may have over-prepared in my case as I was kind of aiming for the Google Cloud Developer certification in the future.\nTips on preparing for the exam # Here are some of the coursera courses that should help prepare for the exam. The coursera courses give a quick overview of the features available in Google Cloud Platform. They use Qwiklabs in order to allow you to quickly test out one or two of the features on the platform and be familiar with using them (without you having a GCP account and paying for trying out the service) https://www.coursera.org/specializations/gcp-architecture https://www.coursera.org/specializations/developing-apps-gcp Qwiklab Quests https://www.qwiklabs.com/quests/23 Being familiar with the Google Cloud Console. Rather than just just sticking to the coursera courses and qwiklabs only, it really helps to use the platform on the daily basis. Use it and try developing apps that use specific features on the platform. It is relatively quite easy to try out features as there is plenty of sample code within the Google Cloud Platform documentation page. For the Google Cloud Developer certification, the exam literally would test every aspect of the platform. It\u0026rsquo;s especially vital to appreciate the different choices of data products in Google Cloud Platform such as Cloud Spanner, Cloud Datastore, Cloud SQL etc. E.g. Like when and what situation you would you one of the products over the rest; why would you migrate over to other data products when your requirements grows etc ","date":"26 January 2019","externalUrl":null,"permalink":"/preparing-for-google-cloud-developer-certification/","section":"Posts","summary":"Recently, Google has been launching a couple of certification programs that would help people dictate their knowledge and expertise levels with the Google Cloud Platform. At the moment (January 2019), there are 7 certifications including Professional Cloud Architect, Cloud Developer and Professional Data Engineer.\n","title":"Preparing for Google Cloud Developer Certification","type":"posts"},{"content":"There are various tooling out there to make deployment of applications easier. Some tools are used in order to help developers and organizations attempt to reach the \u0026ldquo;12 factor app\u0026rdquo; standard of applications which are set of applications that are explicitly designed to be able to scale where needed.\nNowadays, many people turn to docker in an attempt to solve some of the goals in 12 factor app designs. (e.g. Using dockerfiles which would declaratively mention all dependencies needed by the application and its operations.). This property of requiring dependencies to be declared in files for application would eventually lead to teams requiring to make immutable server images which is where devops tools like Packer and Terraform can help tremendously.\nHowever, let\u0026rsquo;s say that we are restricted from using containers and if we were to rely on only virtual machines in public cloud. What could we depend on?\nLet\u0026rsquo;s go through several tools that can help developers achieve this goal:\nAnsible. A tool that allows developers to declarively configure a server more easily. The ansible allows one to declaratively/implicitly install packages, add configurations files via use of templates files and run admin commands. It is possible to actually run bash scripts to run get all of such settings into the server but shell scripts aren\u0026rsquo;t easy to read and debug and program. It is easier to write scripts but with ansible, the tool comes with a whole bunch of functionality that allows one to declare the require actions to install the required software onto the server. Terraform. A tool that allows developers to create their needed infrastructure on a public cloud. (There are other uses to this tool, but generally, it is used to bootstrap/maintain infrastructure). With the Terraform tool, it allows one to maintain a set of files which describes their application infrastructure. The tool would take the responsibility to ensure that the actual infrastructure matches what is being described in those files. Packer. A tool that creates custom images. When trying to scale applications on the cloud, it is required to create some sort of server template that the cloud can use to create multiple copies of the application for horizontal scalability. In order to help create the server template in a reproducible manner, we can use scripts to create the images (rather than setting up the servers manually and then setting it to be a template that the cloud vendor can use to support scaling needs) Case Study: Installing Nginx # Let\u0026rsquo;s set a simple example of using the above devops tools in order to setup a scenario of install Nginx webserver. We would try the example below on the Google Cloud Platform but I would imagine it would relatively be easy to have an AWS version of such configuration files.\nStep 1: Setup of Ansible Scripts # Seeing that it is easier to do server configuration on Ansible as compared to fiddling around with the other two, we would do so. Interestingly, Packer has a Ansible provisioner which can make use of. Refer to the documentation here: https://www.packer.io/docs/provisioners/ansible-local.html\nThe setup of Ansible would mean installing Python and Ansible on the machine which the command to run the commands.\n- hosts: default become: true tasks: - name: Install nginx on server apt: name: \u0026#34;{{ packages }}\u0026#34; update_cache: yes vars: packages: - nginx As mentioned, when we are using Ansible, we would want to use remote Ansible so that we don\u0026rsquo;t have the requirement of needing the remote machine to have Python and Ansible installed (they\u0026rsquo;re pretty heavy applications to be installed)\nWith the ansible script, we can build the layer of setting up Packer that would make use of this ansible scripts. (Naturally, we can have a more complex Ansible script but maybe that is more another time)\nStep 2: Packer wrapping Ansible # We can then have the Packer json file that would make use of the above Ansible file.\n{ \u0026#34;provisioners\u0026#34;: [ { \u0026#34;type\u0026#34;: \u0026#34;ansible\u0026#34;, \u0026#34;playbook_file\u0026#34;: \u0026#34;./playbook.yml\u0026#34; } ], \u0026#34;builders\u0026#34;: [ { \u0026#34;type\u0026#34;: \u0026#34;googlecompute\u0026#34;, \u0026#34;account_file\u0026#34;: \u0026#34;account.json\u0026#34;, \u0026#34;project_id\u0026#34;: \u0026#34;\u0026#34;, \u0026#34;source_image\u0026#34;: \u0026#34;debian-9-stretch-v20181210\u0026#34;, \u0026#34;ssh_username\u0026#34;: \u0026#34;packer\u0026#34;, \u0026#34;zone\u0026#34;: \u0026#34;us-central1-a\u0026#34; } ] } Do note that configurations are needed for Packer to build the Virtual Machine image. In our case, we would name it account.json. We would also need to provide the project id that packer would use to build the image. What would happen under the scenes would be that Packer would contact the GCP APIs; create a temporary VM on Google Cloud Compute. It would then run the Ansible script to config the server and once that is successful, it would save the server as a VM image on the platform, allow you to easily decide between immutable copies of the application.\nStep 3: Terraform to bootstrap Infrastructure # To combine all the above efforts together, the terraform tool is used. This tool makes it relatively easy to declaratively set your required infrastructure. As changes are made to the declarations, the tool would try it\u0026rsquo;s best to sync up the changes between the declared state of the infrastructure and the actual state of the infrastructure of the cloud platform\nvariable \u0026#34;project\u0026#34; {} provider \u0026#34;google\u0026#34; { credentials = \u0026#34;${file(\u0026#34;account.json\u0026#34;)}\u0026#34; project = \u0026#34;${var.project}\u0026#34; region = \u0026#34;us-central1\u0026#34; zone = \u0026#34;us-central1-c\u0026#34; } resource \u0026#34;google_compute_instance\u0026#34; \u0026#34;vm_instance\u0026#34; { name = \u0026#34;terraform-instance\u0026#34; machine_type = \u0026#34;f1-micro\u0026#34; boot_disk { initialize_params { image = \u0026#34;packer-1545674155\u0026#34; } } tags = [\u0026#34;http-server\u0026#34;] labels = { \u0026#34;infra\u0026#34; = \u0026#34;automated\u0026#34; } network_interface { network = \u0026#34;${google_compute_network.vpc_network.self_link}\u0026#34; access_config = {} } } resource \u0026#34;google_compute_network\u0026#34; \u0026#34;vpc_network\u0026#34; { name = \u0026#34;terraform-network\u0026#34; auto_create_subnetworks = \u0026#34;true\u0026#34; } resource \u0026#34;google_compute_firewall\u0026#34; \u0026#34;web-firewall\u0026#34; { name = \u0026#34;web-firewall\u0026#34; network = \u0026#34;${google_compute_network.vpc_network.self_link}\u0026#34; allow { protocol = \u0026#34;tcp\u0026#34; ports = [\u0026#34;80\u0026#34;] } source_ranges = [\u0026#34;0.0.0.0/0\u0026#34;] } As an example, the above config file declares the network/firewalls required, the machine types as well as memory size/cpu requirements. More complex rules can be added such as declaring storing buckets, databases etc.\nA more complex example would be provided in this blog in the future.\nResources # The above set of code snippets is still incomplete; it only covers files that may be of interest for this blog post to cover. To be on the safer side, rely on the codes within the github repo link below:\nRefer to the full set of code here:\nhttps://github.com/hairizuanbinnoorazman/infra-as-code-examples/tree/master/Example1\n","date":"14 January 2019","externalUrl":null,"permalink":"/devops-tools-with-google-cloud-platform/","section":"Posts","summary":"There are various tooling out there to make deployment of applications easier. Some tools are used in order to help developers and organizations attempt to reach the “12 factor app” standard of applications which are set of applications that are explicitly designed to be able to scale where needed.\n","title":"Devops Tools with Google Cloud Platform","type":"posts"},{"content":"","date":"10 January 2019","externalUrl":null,"permalink":"/categories/conference/","section":"Article Categories","summary":"","title":"Conference","type":"categories"},{"content":"The list would be updated as time goes by in the year\nA list of conferences and meetups and exhibitions to look for especially in 2019:\nThis kind of personal list that I\u0026rsquo;m keeping track; it mainly revolves around Golang, modern architecture technologies e.g. Cloud technologies etc, Python and even R (One of my initial language, I still do keep a lookout of how it\u0026rsquo;s doing nowadays.)\nConferences # RStudio Conference January 2019 https://www.rstudio.com/conference/ AWS Summit Singapore 2019 April 2019 Google Cloud Next Event 2019 April 2019 https://cloud.google.com/blog/products/gcp/mark-your-calendar-google-cloud-next-2019 Dockercon 2019 https://dockercon19.smarteventscloud.com/portal/newreg.ww Cloud Kubecon/Nativecon 2019 May 2019 https://events.linuxfoundation.org/events/kubecon-cloudnativecon-europe-2019/ Vue Conf 2019 May 2019 https://us.vuejs.org/ Gophercon Conf 2019 July 2019 https://www.gophercon.com/ Refer to this site for more gophercon events: https://github.com/golang/go/wiki/Conferences AWS Re:invent Nov/Dec 2019 No URL available yet ","date":"10 January 2019","externalUrl":null,"permalink":"/things-to-watch-out-for-in-2019/","section":"Posts","summary":"The list would be updated as time goes by in the year\nA list of conferences and meetups and exhibitions to look for especially in 2019:\nThis kind of personal list that I’m keeping track; it mainly revolves around Golang, modern architecture technologies e.g. Cloud technologies etc, Python and even R (One of my initial language, I still do keep a lookout of how it’s doing nowadays.)\n","title":"Things to watch out for in 2019","type":"posts"},{"content":"This is a continuation of previous blog post.\nTo summarize the previous related blog post.\nToo painful to have people respond and react to report generation and compilation Too expensive to have machine lying around to pick up the slack and automate the reports; serverless solutions (pay on use) could be a useful model to use when running automated reports. Scenario presented for example purposes: 3 reports generated which are to be compiled to a single report. Previously mentioned 3 reports would be processed on the condition when the data files are dropped into the storage buckets. Event generated from it would automatically run the report Compilating reports # The next part of resolving our above mentioned situation (read previous blog post - part 1 for more details on this) is to compile the report. There are several ways to handle, each with their own advantages and drawbacks respectively. We would use the terms subreport to refer to reports for the initial set of reports that would then need to be compiled into a final report. These are just possible solutions; the combination of products that can be used to achieve the final goal of checking subreports and then compiling into the final report.\nSolution 1: On each time when a subreport is submitted and a check is run, we would run the function which would check the subreport. Afterwhich, we would then save the info that we checked the subreport into some sort of data storage (database). On each hour, we would run another function that would check the database; once all the subreports are ready, we would then do the compilation of the reports and then, we would be done for the day.\nIssue: We would probably need to rely on another service: Google Cloud Scheduler (just released) which would maintain the cron schedule. That would trigger the google cloud function to run an hourly basis to check Solution 2: On each time when a subreport is submitted and a check is run, we would also run a check on the other subreports. Once they are all complete, we would then add a message on Google Pubsub. This would allow us to trigger another Google Cloud Function that would do the compilation of subreports which would then be used to generate the final report.\nIssue: With the above method, we would need to recheck all subreports on each submission report. That would result in wasted computation where we would need to keep rechecking all subreports each time. It would ideal to store the information that some of the subreports have been checked to prevent computation from being wasted from checking the data. Depending on sizes of the data that would be checked, that would result in increase of the amount of time needed to process the subreport which would inadvertably result in an increase in cost of running the automation. The whole point of going down the serverless route is to try to reduce the cost of the services to as low as possible. The solution that is finally sort of picked (considering that google cloud schedule was not yet available when this was created) is the following. It is mixed of both solution 1 and solution 2 that was initially proposed above.\nOn submission of each subreport into the Google Cloud Storage bucket, it would trigger a Google Cloud Function to run a check on the subreport. Once the check is complete and passes, it would store that information into Google Cloud Datastore (a database) The last bit of checking the subreport would be a check on the records on google cloud datastore for records for the day; Are subreports checked and have they all passed so that compilation can be done. If the checks are all good, a message is dropped on Google Pubsub which would then be used to trigger the Google Cloud Function to run the compilation function. The compilation function is triggered via a message on Google Cloud Pubsub, these would compile the report and then send the message to Slack or via email etc The full source code for the above is available in the repo here: https://github.com/hairizuanbinnoorazman/gcf-analytics/tree/941c813b3ebefdd0640c098447ba337d0902c034\nSlides on this is available here: https://docs.google.com/presentation/d/1trt8SyQYSgUfx8AfHZ7Pt8_VzfIqEsJerpQYqhQ-MIw/edit\n","date":"10 November 2018","externalUrl":null,"permalink":"/triggering-analytics-via-serverless-functions-part-2/","section":"Posts","summary":"This is a continuation of previous blog post.\nTo summarize the previous related blog post.\nToo painful to have people respond and react to report generation and compilation Too expensive to have machine lying around to pick up the slack and automate the reports; serverless solutions (pay on use) could be a useful model to use when running automated reports. Scenario presented for example purposes: 3 reports generated which are to be compiled to a single report. Previously mentioned 3 reports would be processed on the condition when the data files are dropped into the storage buckets. Event generated from it would automatically run the report Compilating reports # The next part of resolving our above mentioned situation (read previous blog post - part 1 for more details on this) is to compile the report. There are several ways to handle, each with their own advantages and drawbacks respectively. We would use the terms subreport to refer to reports for the initial set of reports that would then need to be compiled into a final report. These are just possible solutions; the combination of products that can be used to achieve the final goal of checking subreports and then compiling into the final report.\n","title":"Triggering analytics via Serverless Functions Part 2","type":"posts"},{"content":"Seeing how functions change the way one looks at compute workloads in terms of products makes me wonder how one/companies can look at their analytics workloads and try to see if it was possible to change the costing model in that direction.\nPreviously, 1-2 years ago, if one told me that they needed to run some automation scripts written in python or R languages, I would probably stretch my fingers and immediately begin work deploying a Linux compute service. I would manually install all the dependencies needed and proceed to give the required users access to the servers before continuing on my way. This meant that the server would continue to operate continuously. They aren\u0026rsquo;t going to keep shutting it down and then re-asking the engineers to recreate the servers over and over again; it\u0026rsquo;s going to cost more if done that way.\nFortunately, times have changed quite a bit since then. Other automation tools came along (e.g. Ansible, Packer, Terraform), then containers came (e.g. Docker) and now, the big movement from the industry, functions as a services (FAAS).\nLet\u0026rsquo;s say if we chose to develop our analytical workload onto FAAS by a cloud provider. Just imagine writing a function and then throwing it to a provider and letting the provider figure out how to run that service. One no longer has to think of how to ensure that the machine provisioned had to be able to take on all the analytical load during that time and even ensuring that the cost of provisioning the machine being kept as low as we possibly can.\nHowever, rather than keep going on how awesome the FAAS model for running workloads is, let\u0026rsquo;s have a sample application workload that we can work with. Over here, it is mainly demonstrated with Google Cloud Functions, but I would imagine it would work well with\nScenario # Let\u0026rsquo;s imagine that we have the following scenario. We have a main analytics department that needs to compile a report from 3 other departments. Let\u0026rsquo;s have this main analytics department be nicknamed main. For the rest of the reports, they would be produced by team A, team B and team C respectively. (E.g. team A would produce the report A). The usual workflow for this is the following:\nmain requests for the subreports to be submitted team A sends in report A team B sends in report B team C sends in report C main team compiles the report and submit it to the business team with their insights which can empower the whole business to make data driven decisions Unfortunately, the above is usually just the ideal case. More likely than not is that the below would happen\nmain requests for the subreports to be submitted team A sends in report A team B sends in report B report A has many errors; needed to be corrected by team A and resubmitted team C A reminder email needs to be sent team B An error was realized by an analyst on the main team (summing some of the columns meant that the data wasn\u0026rsquo;t filtered properly etc) - report needs to be resubmitted once more etc\u0026hellip;. The process, if done well could take over a single day but due to many potentials issues that can come up, it becomes highly unlikely that the ideal scenario could take place. Due to the back of forth for report requests, a \u0026ldquo;simple\u0026rdquo; report that was expected to be completed and done in 2-3 days could end up requiring a whole week or longer before it gets submitted. It gets kind of weird and funny when you imagine that the weekly report that a business team would receiving was over 1 week overdue, making it harder to correlate actions to the results reported.\nHumans are prone to errors, an error free month in the current month doesn\u0026rsquo;t mean an error free reports moving forward. Other factors can come to play where the main team could have requested the data too late, and the respective team doesn\u0026rsquo;t have time to compile the report that they are supposed to deliver etc.\nHow can we try fixing this? # Part of the reason why the report can take longer is that humans are previously needed to check the report. And sometimes, even if there is a script, a human worker still needs to step in to run the report. This means that the teams that does the subreports (in our case, report A, B and C) would have some lag before they know if their reports are useful and correct to be used for analysis in the main report.\nBased on the above, it would be nice if the guys working on the sub reports would get instant feedback so that they immediately continue working on it without waiting a day or two for feedback on what went wrong for their report.\nAnd here\u0026rsquo;s where part 1 of the solution can kind of come in; Google Cloud Functions can be triggered when a report is dropped into a storage bucket. So a team that was working on a report could upload the report to a storage bucket and that could immediately trigger an function that could check the report being submitted.\nYou can view the following possible options on the various triggers one could use while creating the cloud function:\nNow that we have our trigger setup, we can then write our function which would run in order to check our data.\nLet\u0026rsquo;s say that this was the main logic that we would want to run on each dataset.\nWe would want to ensure that all the columns that was specified here is there and available on the dataset We would want to check that there is at least a row of data available in each dataset Over here, we are assuming that the data has been read via the panda\u0026rsquo;s library as a python dataframe, which would allow us to manipulate it accordingly.\nimport pandas as pd def run_check(data): assert isinstance( data, pd.DataFrame), \u0026#34;Expected data to be a pandas dataframe\u0026#34; errors = [] # Number of rows is more than 0 data_shape = data.shape if data_shape[0] \u0026lt;= 0: errors.append(\u0026#34;Empty dataset\u0026#34;) # Column check expected_keys = [\u0026#34;id\u0026#34;, \u0026#34;data\u0026#34;, \u0026#34;source\u0026#34;, \u0026#34;target\u0026#34;, \u0026#34;state\u0026#34;] columns = data.columns for key in expected_keys: if key not in columns: errors.append(\u0026#34;{} column is missing\u0026#34;.format(key)) return errors To tie the above functionality together, we would also need to note that the Google Cloud Storage bucket does not provide the function the actual file for us to check. It would only provide the metadata of the file that was being dropped into the bucket. Some examples of the data available would be:\nName of the file being dropped \u0026ldquo;Folder\u0026rdquo; of the file being dropped in bucket Timestamp of the file being dropped Information on whether the file has any expiry dates before it gets automatically deleted etc Further information on these can be found here: https://cloud.google.com/functions/docs/writing/background https://cloud.google.com/storage/docs/json_api/v1/objects\nWe would need to do the following in order to run our checks end to end (this includes informating the team responsible for the report):\nPull in our configuration/secret files from someone. A possible place to store our config keys is in a bucket. However, one can probably store it in more secure places; it justs needs to be added into the function Downloading the file from bucket into the Cloud Function which it can process further and run its checks Send error/successful logs to the communication channel. In this example, it is done via Slack. def main(data, context): \u0026#34;\u0026#34;\u0026#34;Background Cloud Function to be triggered by Cloud Storage. Args: data (dict): The dictionary with data specific to this type of event. context (google.cloud.functions.Context): The Cloud Functions event metadata. \u0026#34;\u0026#34;\u0026#34; bucket_id = data[\u0026#39;bucket\u0026#39;] file_name = data[\u0026#39;name\u0026#39;] assert isinstance(bucket_id, str), \u0026#34;Bucket id provided is not a string\u0026#34; assert isinstance(file_name, str), \u0026#34;Filename provided is not a string\u0026#34; # Retrieve configuration files client = storage.Client() bucket = client.get_bucket(\u0026#39;gcf-test-analytics-demo1\u0026#39;) blob = bucket.get_blob(\u0026#39;config/config.json\u0026#39;) keys = blob.download_as_string() keys_json = json.loads(keys) # Retrieve slack channel id slack_token = keys_json[\u0026#39;slack_token\u0026#39;] slack_channel_name = keys_json[\u0026#39;slack_channel_name\u0026#39;] channel_id = slack.get_channel_list(slack_token, slack_channel_name) slack.send_text_to_channel( slack_token, channel_id, \u0026#34;Received csv file. Will begin checking\u0026#34;) # Download file and process it data_blob = bucket.get_blob(file_name) try: data_blob.download_to_filename(\u0026#34;/tmp/{}\u0026#34;.format(file_name)) data = pd.read_csv(\u0026#34;/tmp/{}\u0026#34;.format(file_name)) except Exception as e: logging.error(e) err_list = analytics_check.run_check(data) if len(err_list) \u0026gt; 0: error_test = \u0026#34;\u0026#34; for item in err_list: error_text = \u0026#34;{}\\n{}\u0026#34;.format(error_text, item) slack.send_text_to_channel( slack_token, channel_id, error_text) else: slack.send_text_to_channel(slack_token, channel_id, \u0026#34;All good\u0026#34;) The full codebase for this can be found here: https://github.com/hairizuanbinnoorazman/gcf-analytics/tree/demo1\nConcluding part 1 # The above code should handle the cases where sub reports being submitted are being immediately checked and the feedbacks for those said checks are being returned to the team involved. E.g. Now the team that sent the report do not need to \u0026ldquo;recheck\u0026rdquo; their work. If the functional check say it\u0026rsquo;s ok, then it should be probably ok to move on with their life without worrying if the \u0026ldquo;main\u0026rdquo; analytics team would come back to them, requesting for even more information and changes on the report they sent in.\nAnd with that, we\u0026rsquo;re done with part 1 of our long solution. There is another part which talks about the secondary portion which is the compilation of our subreport.\n","date":"6 November 2018","externalUrl":null,"permalink":"/triggering-analytics-via-serverless-functions-part-1/","section":"Posts","summary":"Seeing how functions change the way one looks at compute workloads in terms of products makes me wonder how one/companies can look at their analytics workloads and try to see if it was possible to change the costing model in that direction.\n","title":"Triggering analytics via Serverless Functions Part 1","type":"posts"},{"content":"Data engineering work usually serves to be fundamentally one of the important bits when it comes to report generation in the business. The act of connecting of understanding the data that goes through the business and the need to maintain all the scripts that handle the pulling and merging all of such data makes the job way harder than one can expect. You are not expected to just be a script junkie; you are expected to be an expert at your domain, understanding the different nuances and assumption each line of script imposes on the processing of such data.\nAcquiring the initial set of requirements and writing such automation scripts is usually considered the easiest bit. The harder bits are maintainance, upgrades as well as ensuring that the scripts can be deployed to their respective users. If projects are prototyped rather than properly engineering, one can be pretty sure that there would be hiccoughs and plenty of engineering hours (fancy some late nights and overtime?) in order to ensure that the scripts are ready and running.\nLet\u0026rsquo;s go several scenarios on some of the more important bits to consider when doing automations at that stage.\nThe Beginning Using Git Package versioning Proper Commenting Vectorized Operations Decoupled Data Sources Testing Algorithms Proper Config Management Automate Documentation Survived the initial hell The Beginning # The beginning for most people, departments or even companies would usually mean just writing a script to quickly pull in the numbers and dumping it into an output file. The output file could be a excel file, csv file or other formats with the aim of such files being fed to a vizualization tool off sorts.\nScripts serve to be the greatest blessing and curse at the same of such teams. Due to their flexibility, one can make scripts to essentially automate every aspect of their jobs (Generating that dreaded report every week, downloading reports from email and saving it in some folder on some online storage for the team). As a result of that flexibility, that would mean that scripts can easily become more complex that expected which leads to eventual huge amounts of technical teams for engineers team to handle.\nWith that, let\u0026rsquo;s see if there are way and methods in order to reduce that debt or make such jobs slightly easier.\nHere are some of the ways to do so:\nUsing Git # Git is not github (Repeat this 3 times to yourself). Most people get their first taste of git via Github and it is quite understandable to relate git to github. However, git is just tool that helps with version control of any text-based related document (it does binary as well but it\u0026rsquo;s not as useful in that regard). Once one install git, you can use git init and with that, you\u0026rsquo;ve kind of have yourself a local repository that you can play around and control.\nWith git, we can do experimentation with the code base. This is done by branching our code and testing our changes and assumptions on alternative branches. If those tests turn out faulty or the assumptions that we have are wrong, we could just as easily revert it back.\nPackage versioning # Package versioning is one of those problems that is not a problem until you get bitten by one of those versioning problems; but once you get bitten, it is really painful to recover from. You can probably feel such pains by volunteering to upgrade a 5-7 year old django or ruby or rails that have not been touch or undergo heavy maintainance. I swear you will definitely feel like rage quiting half way through.\nPart of the problem of attempting to upgrade such applications is that different versions of packages used by applications would result in some packages having conflicting versions.\nLet\u0026rsquo;s have an example of a web application requiring the following dependencies:\npackage A (v1.0.0) -\u0026gt; dependant on package C (v0.1.0) pacakge B (v1.0.0) -\u0026gt; dependant on package C (v0.1.0) Let\u0026rsquo;s say we found out that there is an update in package A which requires an upgrade for package C.\npackage A (v1.1.0) -\u0026gt; dependant on package C (v0.2.0) pacakge B (v1.0.0) -\u0026gt; dependant on package C (v0.1.0) Now, package B would also need to upgrade since its also dependant on package C. But what if the scenario is such that package B\u0026rsquo;s author has not been upgrading that package and it the upgrade of version 0.2.0 of package C causes failure to package B. Now there\u0026rsquo;s a conflict in package added to the project.\nImagine this on the scale of an app where there could be hundreds of packages and each of packages have their dependencies; the problem now has become an exponential one.\nWhen it comes to languages like Python, there are various tools that handle this. The requirements.txt file was one of the better ways of doing it but ever since the pipenv package came out, that method is definitely the better way of handling package versioning in a project. Further details won\u0026rsquo;t be available here but in a another blog post.\nProper Commenting # Comments are generic way of adding context to codebases. Sometimes, due to structures of code, comments are used to explain what that section is trying to do. Unfortunately, using comments this way is not the most effective way of using them - when code needs comments in order to explain what it\u0026rsquo;s trying to do, its a sign that there is a need to do some sort of abstraction on that section of the script; e.g. breaking that section out into a separate function.\nInstead, it is much more vital to actually use comments to explain why a certain section of code was introduced. People usually get the what of code, and this is generally read from the code itself rather than trusting the comment section blindly. The whys allows one to understand more context of why the code base on designed in a certain way.\nSome example comments could be:\ndef a_random_function(random_number): \u0026#34;\u0026#34;\u0026#34; Checks and returns a corrected ID from the database :param random_number: An integer that is the ID of the record being checked :type random_number: int :returns: An integer representing the corrected ID for reference \u0026#34;\u0026#34;\u0026#34; # Refer to #182. Check for random number more than 82 is necessary as the database did not record values for that record ID # Closest match for this was to record 72 which would have returned 77 if random_number \u0026gt; 82: return 77 else: return random_number + 5 With the above comment, we now understand why the comment was added that. We would have eventually understand what the function is doing from the function documentation but we wouldn\u0026rsquo;t know why the random number has a condition check for more than 82 there unless context was provided.\nVectorized Operations # When one starts programming for the first time, the usual way of having a piece of code run across each item in a set of items would be to use loop. In our case, we can imagine each row of a dataframe (this term can be appied across both R and Python) as an item in a set of items (a set of rows together forms a dataframe). Looping works fine across smaller dataframes and it appear easier to understand but when there is huge amounts of manipulation needed for each dataframe, one can get easily confused.\nLet\u0026rsquo;s have a naive example. Let\u0026rsquo;s say we would want to add a new column to a dataset that adds two columns together.\nThis is via loops\nimport pandas as pd df = pd.read_csv(\u0026#34;initial.csv\u0026#34;) # Let\u0026#39;s assume that the new column is in column 5 df[\u0026#39;newColumn\u0026#39;] = 0 for i in range(0, len(df)): df.iloc[4, i] = df.iloc[3, i] + df.iloc[2, i] This is via python\u0026rsquo;s pandas apply\nimport pandas as pd df = pd.read_csv(\u0026#34;initial.csv\u0026#34;) df[\u0026#39;newColumn\u0026#39;] = df.apply(lambda x: x[\u0026#39;columnTarget1\u0026#39;] + x[\u0026#39;columnTarget2\u0026#39;], axis=1) Compare the two above, the latter code is more succint and to the point. Setting up loops makes it really hard to read the code and the focus of the code would become one where there is a need to have the maintainer of the script ensure that the loops are set up right. The natural naive approach is to go with the former where one just loops over the rows whereas the latter approach is harder to understand conceptually but once understood, it becomes way easier to read and debug.\nThis leads to my point about vectorized operations. In the former piece of python code, having code that loops, fetch the data by index, manipulates them accordingly and then put the value back to the dataframe by index. Most of the work to do this is on the python level which means that is a limit to how fast it can go. The loop can only work on one item at a time. If you think about it on a naive level, how would you want the operation be done faster? Ideally we would want the work to be done on multiple items at the same time in a parallel fashion.\nThis is where vectorized operations kind of come in. To put it simply, vectorized operations is computation done on an array instead of an item at where one time. Refer to this link on wikipedia: https://en.wikipedia.org/wiki/Array_programming\nWhen we use functions provided in pandas (under the hood it uses numpy), it would actually have that operation vectorized which means we can do our computation work way faster. Try proving to yourself by creating a huge dataset and then attempt to manipulate the dataset via loops and via the functions that pandas offers; you will see that pandas outperforms the loop based approach especially if the datasets get bigger.\nHence, as much as possible, if you are already using the pandas library, just go ahead and utilize as much of the functions that pandas provide until it is really impossible to do so with it. (It is really hard to do so, there are a whole bunch of functions that you wouldn\u0026rsquo;t even imagine it being within the library)\nOne common case that often come up is that we would need to add data across rows but there for one of the rows, the data that we would need to use to add it is in the previous row. If you think about it, with the row based and column based ways of how calculations are done in pandas, it would appear as though it would impossible to do calculations unless one can specify how to have to refer to the previous row for row-wise calculation. However, an interesting way to see it is that all we need to do is to pull the data downwards by 1 row and then handle the cases when the data is not available. This can be done by the lag/lead methods available in pandas library in python and dplyr library in R.\nDecoupled Data Sources # This is the one pain point that is not completely obvious to people when they create the initial versions of the data automation processing scripts. Most of the time, scripts that automate data processing don\u0026rsquo;t last very long; such scripts are used to solve a temporary problem and once the problem is kind of \u0026ldquo;solved\u0026rdquo;, it is then handed over to proper engineering teams who would re-engineer it for proper use.\nHowever, what if the situation is one where you are the one who has the maintain the scripts for long periods of time. What would you need to consider?\nA few things can easily come to mind:\nData sources that change across time. Maybe the initial prototypes were done via csv files that some manager in the company. However, the frequency where the manager who has access to the data is too slow and you would want to get faster and more frequent access to the data. The database access is provided to you. Now the problem becomes how to make sure that the data being pulled out does not result in your automation scripts breaking. There would be plenty of checks just to ensure that the right data and the right form is coming in. Ensuring that the data sources has the right set of columns for use. This means testing the data to ensure certain columns exist for manipulation further down the line. This is especially important when data sources go through \u0026ldquo;human hands\u0026rdquo;. The worst form of data sources are one that manually and lovingly constructed by people. Part of the reason of why this happen is naiveness. People assume scripts are robust enough to be able to handle column changes and addition of columns etc but that is where most scripts start to fail. Even shifting the column order can easily break scripts that rely on index numbers etc. One way to combat this problem is to write the script which abstracts the reading of the data sources out from the main algorithmic part of the script. So instead of just writing this:\nimport pandas as pd data = pd.read_csv(\u0026#34;some-sample.csv\u0026#34;) data = data.groupby(\u0026#39;A\u0026#39;).sum() This is where we are putting a few assumptions already. It is assumed that the data being loaded in from some-sample.csv already has column and still has column A. (This may not be true. A rename of this would already break this)\nWe can instead write logic to check that the data would contain certain parameters etc but it would begin to pollute the main script even further.\nI will provide another blog post on how to do this effectively.\nDecoupling data sources R (Coming soon) Decoupling data sources Python (Coming soon) Testing Algorithms # After having all the above in place, we would begin to write algorithms. This is would be heart and meat of our script. This would be where we would encode our business logic and express how the data is to be manipulated in order for us to get the findings that we would need to decide the next move for the business.\nAlgorithms that are written need to be tested, especially for the corner cases that we would expect; we would need to write a whole bunch of test cases and we somehow need to make it easier to add additional test cases in the future should the need arises.\nIn the golang community, there is a interesting concept called table driven tests; it involves setting up an array like structure which one can easily append additional records to it to set new test cases. The test would involve going through each of the test case and check against each one of them to see if the response of the algorithms meets the required spec that we would specify.\nI will provide another blog post of an example of how to do this effectively.\ntable driven tests in Python table driven tests in R Proper Config Management # Every data script has its own set of ways to allow for script configuration. Some scripts are written in a sophisticated manner such the script exposes a command line tool which takes a configuration flags as input. This configuration flags affects the runtime behaviour of the script.\nAlternatively, another way of how one could possibly control the runtime of such scripts is to have it read configuration files. Common configuration files are json, yaml and even text based files but in the case of data manipulation, sometimes, one would need to provide data mapping files (which you can consider it as some sort of configuration file as well.)\nOne of the worst ways of doing configurations is to do it in a format which doesn\u0026rsquo;t allow you to track changes between versions of configuration managements. Text based configuration files are fine; e.g. csv, text, json and yaml files. However, if one uses binary based formats, it makes it really really hard to replicate and reproduce configurations and the various script runs.\nOne may think: \u0026ldquo;Is managing config management that important?\u0026rdquo; and my answer to that is to try maintaining a script where there is no single source of truth of configurations. It makes hard and almost close to impossible to ensure to replicate that same exact configuration which would produce the error that the user of the script described before. If one uses tools that provide some sort of versions (or named versioning), it would allow it to be much easier to handle such issues.\nBy default text based formats are immediately ok for use once it is coupled together with git. One can checkin such configuration files which can be maintained and managed accordingly with easy retrieval. (E.g. You can create a new branch for each new configuration? - different run == different script?)\nAlso, equally important is not only to know the exact configuration of configuration to replicate and reproduce a specified run but to only to understand how the configuration requirements changes across time. Knowing this would allow a developer to understand some of the underlying assumptions when designing the configuration (the awkward keys in the configuration) being there etc\nAutomate Documentation # A common way of how people document the coding process is not to do it but rather to do it as an after thought. It is sometimes done during handover process or on request by some manager etc. Doing documentation this way makes it such a dreaded process. Yet it such an important process but when it is done months after the coding work is done or on request, there are going to gaps in the knowledge being captured in the documentation.\nAfter being in the field for quite a while, I am of the opinion that code documentation should never be created in a separate tool or even in a separate document. Documentation should be put together with the codebase. This would kind of ensure that as code gets updated, the documentation should be updated as well. Processes can then be put in to ensure that every code change that alters the definition and functionality would require a documentation change.\nI will provide several set of blog posts on how to do it the various languages:\nR Documentation Generation (Coming soon) Python Documentation Generation (Coming soon) However, let\u0026rsquo;s say that the above documentation generation are ones that other members of the team is not appreciative of. They would want to have something where they themselves can contribute as well. Although it is tempting to start using document editors such as Microsoft Word or sth, it would still prove to be a bad choice in the long run. Part of the reason is that code bases evolve along time. This would mean documentation would also need to evolve along side it.\nSurvived the initial hell # The initial hell involved the main writing of the scripts. This involves getting your hands dirty with coding the applications. As mentioned, the initial requirements gathering as well as initial versions of the script are the easy bits. Now this next section won\u0026rsquo;t be as important as the parts in initial help but they do definitely help. There\u0026rsquo;s a reason why we are here; we\u0026rsquo;re here to automate everything and if we still have to manually run the tasks, it would mean that there is plenty of parts that can be improved.\nUsing docker to package the solution up Deploying code on linux machines and putting cron on it Running code as serverless (Functions as a service) Using tools such as Airflow to vizually manage tasks Running tasks in a platform (Kubernetes) ","date":"15 October 2018","externalUrl":null,"permalink":"/best-practices-for-python-scripting-building-reliable-data-science-workflows/","section":"Posts","summary":"Data engineering work usually serves to be fundamentally one of the important bits when it comes to report generation in the business. The act of connecting of understanding the data that goes through the business and the need to maintain all the scripts that handle the pulling and merging all of such data makes the job way harder than one can expect. You are not expected to just be a script junkie; you are expected to be an expert at your domain, understanding the different nuances and assumption each line of script imposes on the processing of such data.\n","title":"Best practices for Python scripting - Building Reliable Data Science Workflows","type":"posts"},{"content":"This is a little experiment to see how this would work; in the case where we have multiple Go binaries with multiple web applications. If we wanted to expose this via a single http endpoint rather than providing a whole multitude of web endpoints.\nWe would have a single nginx server hit 3 different local backend endpoints. Although, for a more complete demo, we should host them on different machines for completeness. However, what we can potentially do would be to have all of these endpoints on a single machine and have nginx reach to them via different paths.\nInstalling the nginx # This would be the easy bit. Since I am most familiar with debian/ubuntu, I would usually just use the following command to install it:\nsudo apt install nginx This would install nginx on the machine\nAdding 3 local backend endpoints to reach to # We would use Go to have our 3 backends for this example\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;log\u0026#34; \u0026#34;net/http\u0026#34; ) // Alter this port number from 8000, 8001, 8002 var portNum = 8002 func sayHello(w http.ResponseWriter, r *http.Request) { log.Printf(\u0026#34;Say Hello to %v\u0026#34;, portNum) msg := fmt.Sprintf(\u0026#34;Application port: %v\u0026#34;, portNum) w.Write([]byte(msg)) } func status(w http.ResponseWriter, r *http.Request) { msg := fmt.Sprintf(\u0026#34;Status: %v\u0026#34;, portNum) w.Write([]byte(msg)) } func main() { log.Println(\u0026#34;This is a main\u0026#34;) defer log.Println(\u0026#34;Exit Main\u0026#34;) http.HandleFunc(\u0026#34;/\u0026#34;, sayHello) http.HandleFunc(\u0026#34;/status\u0026#34;, status) if err := http.ListenAndServe(fmt.Sprintf(\u0026#34;:%v\u0026#34;, portNum), nil); err != nil { panic(err) } } The small binary above would build to a small web binary that would just spit out which port the web request is being served to.\nTo further simplify the setup on server, it would be best to just compile locally and then just scp the binary over to the remote machine. (We shouldn\u0026rsquo;t this for productions systems, create proper CI/CD/gated deployment systems that can help reduce risks during deployment)\nenv GOOS=linux GOARCH=amd64 go build -o test1 ./main.go With the above, we can then set up backends by compiling 3 different binaries that deploy to ports 8001, 8002 and 8003 respectively.\nWe can then just transfer the binaries over to the machine via scp command.\nscp ./test1 {name on server}@{server ip address}:~ # Full example # Need to define absolute path on remote machine scp ./test1 nameonserver@192.0.0.105:~ We can then finally run the applications on the remote machine via the following command.\n./test1 \u0026amp; We can test it via curl command\ncurl localhost:8000 It should be print out the application and its port number\nConfiguring nginx # For the nginx configuration, the main configurations section can be found in the following directory /etc/nginx/. The default configuration imports configs from the /etc/nginx/sites-enabled folder. But the files in this folder is symlinked to the files in the /etc/nginx/sites-available folder. Which would eventually mean the following workflow:\nCreate the nginx configuration required in the sites-available folder Create the symlinked folder Check that nginx configuration is valid and can be loaded with no issues Actually reload the nginx configuration and watch how nginx would work its magic Configuring nginx # Firstly, we would need to add the configuration to the sites-available folder. The following config below can be added. Let\u0026rsquo;s say we added it as testConfig\nserver { listen 80; listen [::]:80; location /test1 { return 302 /test1/; } location /test2 { return 302 /test2/; } location /test3 { return 302 /test3/; } location /test1/ { proxy_pass http://localhost:8000/; } location /test2/ { proxy_pass http://localhost:8001/; } location /test3/ { proxy_pass http://localhost:8002/; } } Notice that there is a /test1 and /test1/. The slash at the back of the matters a lot here. Without it, it would only mean that all configs that has path ends there. There are no further paths that extend beyond that. E.g. If we only have /test1, then http://example.com/test1 and https://example.com/test1/test/test all lead to http://example.com/test1. With the slash at the back, the paths can then be interpreted properly.\nWith the above configuration, we would have test1 path hit for localhost:8000 backend. The test2 path would hit for localhost:8001 and the test3 path would hit for localhost:8002 endpoint.\nCreating symlinked file on sites-enabled # We would then need to creat the symlinked file on the sites-enabled folder.\nUse the following command:\n# Assuming that you\u0026#39;re in the sites-enabled folder # This would create a test1 symlink to your test1 nginx symlink file ln -s ../sites-available/test1 test1 Checking nginx configuration is fine # Use the following command to see what nginx has for its configuration right now and to check whether configurations specified in the files is what that is expected\nsudo nginx -t # OR sudo nginx -T If there are any issues in the nginx files, it would probably gripe and complain; (it also tells you the exact line where it doesn\u0026rsquo;t accept the configuration which is quite nice. So that would allow us to quickly iterate and create a valid configuration)\nReload nginx # For many linux systems out there, there has been a shift to use systemd. We can use that to serve to control our nginx process. So, we can refresh the configurations on nginx (without it going down)\nsudo systemctl reload nginx After running that command above without any issues, it can be assumed that nginx reloaded with no issues. We can proceed to pummel nginx with our requests and servers. Try curling the server to see if it works accordingly.\n# Assuming that you\u0026#39;re on the server curl localhost/test1 curl localhost/test1/status curl localhost/test2 curl localhost/test3 Final thoughts # With the above, we can potentially mix and match the rules on the server that can provide us the configuration we want. In the case above, we have an nginx ingress that would send the traffic inside our server to the multiple backends but if you think it from a multi server point of view; that would make more sense. Imaging going to a service provider, and the provider offered several ip address that you would need to call. To get your job done, it would require you could call these multiple ip addressees -\u0026gt; that would be a hard to use service.\n","date":"29 September 2018","externalUrl":null,"permalink":"/using-nginx-to-serve-as-ingress-to-multiple-servers/","section":"Posts","summary":"This is a little experiment to see how this would work; in the case where we have multiple Go binaries with multiple web applications. If we wanted to expose this via a single http endpoint rather than providing a whole multitude of web endpoints.\n","title":"Using nginx to serve as ingress to multiple servers","type":"posts"},{"content":"","date":"12 September 2018","externalUrl":null,"permalink":"/categories/git/","section":"Article Categories","summary":"","title":"Git","type":"categories"},{"content":"","date":"12 September 2018","externalUrl":null,"permalink":"/tags/git/","section":"Technology Tags","summary":"","title":"Git","type":"tags"},{"content":"Git is one of the most important tools in a software developer arsenal. It is one of the main tool developers use in order to handle and control their code versioning. Mastering it would definitely make one\u0026rsquo;s life way easier and better; failing to do so will bring one into a world of pain. This post doesn\u0026rsquo;t intend to explain vital concepts such as git branches and forks and remotes in great detail so it would ideal if one pick those conecepts before proceeding on with the commands.\nSome basic git concepts # Branches https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell Forks https://help.github.com/articles/fork-a-repo/ Remote https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes git - The beginning # This is assuming that this is the first time you are using git for your projects and you need a quick list of commands to get something done. In that case, the following list of commands would take you through the following steps:\nCloning a repository to your local computer Working on your own branch (lightweight) copy of your local computer. This will help prevent your code from messing up master branch. Add your changes, commit and push to remote Update your local clone by fetching, doing diff and comparing # Clone a repository git clone {{ remote url }} git clone https://github.com/user/repo.git # e.g. # Creating a new branch git checkout -b new_branch # After making some changes git add --all git commit -m \u0026#34;A new temporary feature has been added\u0026#34; git push -u origin new_branch # After making even more changes git add --all git commit -m \u0026#34;Another new temporary feature has been added\u0026#34; git push # After this, do a PR to master branch # If more changes has been added to master and you need to build off the latest master git fetch origin master # Fetches it to origin/master branch locally git diff master origin/master # Checks that the diff is fine, no issues caused git pull origin master # Does a fetch and merge from origin/master into master # Cycle repeat A few things to note; initially, try to not get into the habit of the following:\nDon\u0026rsquo;t push to master (This should be code that everyone/code owners agree to add to the codebase. In order to make the process of code acceptance to master more obvious, pull requests can be done against it) Necessary rebasing efforts # It is often mentioned that rebasing is bad but it is more right to say that rebasing is bad if it affects people who happen to be working on the code. However, in the case of bigger code bases, you don\u0026rsquo;t necessary want to litter your commits with redundant commits. There could be commits that are changes and its reverted changes. Those kind of commits can be \u0026ldquo;dropped\u0026rdquo; out. This can be done via the git rebase command.\n# If you are on temp branch git rebase master # It compares master to temp branch, rewinds back to the commits that master has, then replays additional commits to the tip of master # If you need to drop, squash, alter commit messages: git rebase -i HEAD~10 # Or choose another number # This allows you to do interactive manipulation of git code history. # Before doing it and merging it to master and all, try it on some sample branches. # Make sure that the git commit ids are correct and as expected Useful git commands # List of useful git commands. This is not a full exhaustive list of git commands available. If you\u0026rsquo;re seeking that, you might want to refer to git man pages.\n# Initialize a git repository # One can create a git repository anymore. You don\u0026#39;t necessary need a remote git repository for this git init # Cloning a repository from remote url # Default branch is usually master # Remote is usually called origin git clone {{ remote url }} git clone https://github.com/user/repo.git # Adding changes git add --all # Try not to use this git diff {{ file name }} # Check what changes has been done on the file so far git add {{ file name }} # Having multiple remotes # In the case where you want to move repos between groups or mirroring repos between multiple repos # It is also another way to git remote -v # This is to view list of remotes available git remote add {{ new remote ref }} {{ new remote url }} git remote add tempMirror https://github.com/user1/repo.git # View list of logs git log git log -n 5 # View commit information git show {{ commit id }} # Checking out to other branches git checkout {{ branch_name }} git checkout {{ commit id }} git checkout tags/{{ tag name }} # Deleting branches git branch -D {{ branch name }} # Viewing branches git branch # Grabbing branches from remote git fetch origin {{ branch name }} # For safer development, create a new branch and then do pull request to the master fork/branch # Creates a new branch of from the current branch # You can then push it to remote accordingly git checkout -b {{ new branch name}} git push -u origin {{ new branch name }} # After doing some changes, you can then do the following as you have already added the remote accordingly git add --all git commmit -m \u0026#34;Commit message\u0026#34; git push # If you don\u0026#39;t want a commit in a git repostory, you can revert the commit # This would create a new commit that does the reverse of a commit selected git revert {{ commit id }} # To view who made the changes to a file etc, you can either dig through each commit one at a time # OR, you can just use the following command git blame {{ file name }} ## Bringing over changes from another branch to current branch git cherry-pick {{ commit id }} ## You\u0026#39;ve created a PR and you need to update the PR based on other people\u0026#39;s comments ## This is assuming that other people are reviewing the PR as a whole; ## They do not wish to see the changes made as compared to the last time you review it ## Instead of adding a bunch of commits and squashing them, you can just \u0026#34;amend\u0026#34; said commit on PR ## This allows you to amend the commit as well as the commit message git commit --amend Git submodules # These are pretty rare, it would occur for larger code bases; code bases that orchestrate and make use of many different components which may not be heavily depended on by the project.\n# Initialize submodules git submodule init # Update the submodules git submodule update # Combine the init and update together recursively # (Go down through the folders and initialize and update the submodules along the way) git submodule update --init --recursive Updating the submodules only mean changed the git commit reference for the main code repo. One can just do normal git interactions in each of the child repos but once you got up to the main repo, you will update the commit hashes for each of the child repos that has been altered.\n","date":"12 September 2018","externalUrl":null,"permalink":"/git-cheatsheet/","section":"Posts","summary":"Git is one of the most important tools in a software developer arsenal. It is one of the main tool developers use in order to handle and control their code versioning. Mastering it would definitely make one’s life way easier and better; failing to do so will bring one into a world of pain. This post doesn’t intend to explain vital concepts such as git branches and forks and remotes in great detail so it would ideal if one pick those conecepts before proceeding on with the commands.\n","title":"Git Cheatsheet","type":"posts"},{"content":"This post details my naive attempt to bring up a Kubernetes cluster on a VM. These steps to try out Kubernetes in a bare Google Virtual Machine (but the following steps should work for most Debian/Ubuntu virtual machines). This deploys a single node Kubernetes cluster (naturally don\u0026rsquo;t think of using this for production)\nlxc is the client to lxd which runs linux containers. https://en.wikipedia.org/wiki/LXC. The conjure-up tool would install kubernetes via linux commands\nInstalling snap, lxc and conjure-up # We would be testing to deploy a Kubernetes Cluster on a single node using the snap utility. These are the set of commands to do it. We would install snap via snapd. Then, using snap, we can then install lxd, kubectl and conjure-up We would finally then add our own username to the lxd groups so that we don\u0026rsquo;t need to use sudo when lxd/lxc commands. We would use lxc to communicate to lxd sudo apt update sudo apt -y install snapd sudo snap install lxd sudo snap install kubectl --classic sudo snap install conjure-up --classic sudo usermod --append --groups lxd {{ NAME }} Deploy a Kubernetes cluster via conjure-up # We would first initialize the set of environment for the linux containers to live in; using the init command, we would start up the network as well as storage. One thing to note is that for the snap install Kubernetes command, we are only able dir type of storage. Other storage will cause the deployment to completely halt. # Most defaults work ok except for storage. Storage, choose dir type lxd init # Questions and choices # Would you like to use LXD clustering? (yes/no) [default=no]: no # Do you want to configure a new storage pool? (yes/no) [default=yes]: # Name of the new storage pool [default=default]: # Name of the storage backend to use (btrfs, ceph, dir, lvm) [default=btrfs]: dir # Would you like to connect to a MAAS server? (yes/no) [default=no]: # Would you like to create a new local network bridge? (yes/no) [default=yes]: # What should the new bridge be called? [default=lxdbr0]: # What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: # What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: none # Would you like LXD to be available over the network? (yes/no) [default=no]: # Would you like stale cached images to be updated automatically? (yes/no) [default=yes] # Would you like a YAML \u0026#34;lxd init\u0026#34; preseed to be printed? (yes/no) [default=no]: conjure-up kubernetes # Questions and choices # What kind of kubernetes installation: kubernetes-core # Where to install: localhost # Storage pool: default # Network bridge: lxdbr0 # Network: flannel The generated lxd init file\nconfig: {} networks: - config: ipv4.address: auto ipv6.address: none description: \u0026#34;\u0026#34; managed: false name: lxdbr0 type: \u0026#34;\u0026#34; storage_pools: - config: {} description: \u0026#34;\u0026#34; name: default driver: dir profiles: - config: {} description: \u0026#34;\u0026#34; devices: eth0: name: eth0 nictype: bridged parent: lxdbr0 type: nic root: path: / pool: default type: disk name: default cluster: null The conjure-up command takes a while to run. It depends on virtual machine\u0026rsquo;s network. The faster the network, the faster this can be deployed. After waiting for a while, the kubernetes\nTesting out kubernetes cluster # We would try to run some nginx containers as they one of the simplest to run. With the nginx containers, we should able to hit the service on port 80 and it should return us a default nginx page.\nkubectl run --image nginx lol kubectl run --image nginx lol1 kubectl expose deployments lol --type NodePort --port 80 kubectl exec -it {{ lol1-pod-name }} /bin/bash Within the container, we can test against lol container to see if it would be able to provide the default nginx http page.\napt update apt install curl curl {{ ip address of lol }}:80 Unfortunately, till date, I haven\u0026rsquo;t been able to expose the kubernetes cluster any traffic from the outside world. There several tactics that one can try but none of them work for me. It could be misconfiguration from my part. Networking is a serious pain here and there are many solutions that could potentially solve the issue but I\u0026rsquo;m not exactly sure why or why not something would work.\nSome of the possible actions to get traffic to the cluster. However, I couldn\u0026rsquo;t get any of them to work here:\nUsing iptables. A lot of custom configuration. I\u0026rsquo;ve tred using FORWARD, manipulating the ACCEPT and even attempted REDIRECT but none seem to work Using Nodeports but its definitely not the most ideal solution here. It exposes really weird ports: 30000+ range. There is a high possibility that such ports are blocked in company networks so it wouldn\u0026rsquo;t really make much sense to try it out here. Using Kubernetes Ingress. Requires a domain name for it to work well. This is also a whole bunch of configuration work but the main issue here is that we\u0026rsquo;re not too sure if any traffic that is hitting the host machine is actually hitting the kubernetes cluster. It\u0026rsquo;s hard to inspect for that - tools are definitely needed to check for this. Using externalip and externalname. These resources would not be managed by Kubernetes, however, I\u0026rsquo;m not too sure why these aren\u0026rsquo;t working as expected as well I will still attempt to play around with this tool for deploying Kubernetes clusters but finding a solution to expose the traffic out would take a while\n","date":"5 September 2018","externalUrl":null,"permalink":"/attempting-to-setup-kubernetes-on-ubuntu-vms/","section":"Posts","summary":"This post details my naive attempt to bring up a Kubernetes cluster on a VM. These steps to try out Kubernetes in a bare Google Virtual Machine (but the following steps should work for most Debian/Ubuntu virtual machines). This deploys a single node Kubernetes cluster (naturally don’t think of using this for production)\n","title":"Attempting to setup Kubernetes on Ubuntu VMs","type":"posts"},{"content":"Meetup.com is a pretty nice site to setup meetups and sharings on technologies. The platform is pretty nice and easy to use when it comes to bookings but sometimes, the data provided by its web interface is not sufficient nor does it fit our use case. In this case, let\u0026rsquo;s say you are trying to understand the trend of the number of people attending a meetup. To an organizer, an important thing to him/her is to understand what kind of actions would lead to higher turnups/registrations for a meetup. So, by the end of this post, hopefully we would be able to have a pretty decently priced (free if possible) solution for an analytics solution which would only be called occasionally.\nThere a few ways to solve this, but in this post, we\u0026rsquo;ll be focusing mainly on the third and last option.\nUsing R and Python Scripts Using free platform compute resources Using Serverless solutions Creating a GCF Python app Deploying GCF Python app via Google Cloud Repositories Setting up CI/CD pipelines via use of Google Cloud Builder Integration with Slack Slash commands Getting the full picture from more complete code List of links for TLDR Google Cloud Documentation Links Slack Documentation Links Other Links Using R and Python Scripts # The first easier approach is to just have a python or R script which would then extract the values from the meetup api, which would then be able to pull the values in and then manipulate the values accordingly to be able generate the graphs that we need for analysis.\nThis solution is easy to start with although working with scripts makes it difficult to have such analysis done on demand. Seeing that this information would ideal to be made available at any time, having the solution this way would be that the one who generate the analysis needs to have access to a computer that has the R or Python runtimes available.\nIn the overall big picture, it would be best to move such scripts from running on a local computer which relies on a person manually needing to run it to running it on a server as an api. This would allow it to be consumed by chat applications or mobile applications that would make it easier obtaining the data for analysis.\nUsing free platform compute resources # The Google platform provides several free resources for computation work. The list of free resources are available here for your convenience.\nhttps://cloud.google.com/free/\nAn example of how this free computation can be done would be have the code be hosted on Google App Engine. API endpoints or cron jobs could be set up which could then be used my chat applications or mobile applications. The data could be processed and outputed into the various chat applications or data applications out there.\nUsing Serverless solutions # A possible solution would be the usage of a serverless solution. In the Google Cloud Platform world, that would be the usage of Google Cloud Functions. It was recently announced that it would be in General Availability for Node.js 6 runtimes during the Google Cloud Next 2018 event. However, the interesting/exciting bit was the portion where the bit where it was mentioned that the python runtime is being supported in beta availability.\nYou can look to its release page for further information regarding this. Look to the July 24 release notes:\nhttps://cloud.google.com/functions/docs/release-notes\nSo with python support, we can now start to write python applications/scripts that can utilize this.\nSo, before getting started, we would want to wonder on why use this rather than using our compute engine or app engine etc. One strong reason is the nature of the application we are building here. In our case, we would running the script/application occasionally (sometimes only needing like a few seconds of compute each day). This would mean that it doesn\u0026rsquo;t make sense to have the need to start a beefy compute engine service just to do that work. However, it would still be nice to be able to have an API be able available 24/7 which can be called in our convenience.\nSeeing that Google Cloud Functions are priced in the 100ms interval (different amount of memory being used would lead to slight differences in pricing), this would give us tight granular control over the amount of money we spend on this, making this a cheaper and viable option for us to use especially for application that would only be occasionally used.\nAlso, it would be best if we can set up some sort of CI/CD pipeline for us to use when developing functions for the Google Cloud Functions tool. This would aid in deploying and make it way easier to get the application running on the platform.\nTo sum it up, this post could cover the following aspects:\nCovering a very basic Google Cloud Functions python app (api) Deploying it by relying on Google Cloud Repositories Setting up CI/CD pipelines via use of Google Cloud Builder Getting integration with Slack Slash commands Creating a GCF Python app # We can try a very simple python app just to get our feet wet with Google Cloud Function. There is a quickstart guide on the documentation page, but a copy of it is also available here for completeness sake.\nRefer to the documentation here for a fuller explanation:\nhttps://cloud.google.com/functions/docs/tutorials/http\ndef hello_get(request): \u0026#34;\u0026#34;\u0026#34;HTTP Cloud Function. Args: request (flask.Request): The request object. Returns: The response text, or any set of values that can be turned into a Response object using `make_response` \u0026lt;http://flask.pocoo.org/docs/0.12/api/#flask.Flask.make_response\u0026gt;. \u0026#34;\u0026#34;\u0026#34; return \u0026#39;Hello, World!\u0026#39; If you don\u0026rsquo;t want to handle the console too much at this time as you\u0026rsquo;re trying out, you can just simply copy it over to the editor that is already available on the cloud console for google cloud functions.\nA requirements.txt file is not needed to get started for an initial deployment. However, it is vital for us to understand the limitations of the platform.\nOne of main gripes I have about the serverless platform (inclues AWS lambda as well) is that we have less control over the OS being used to run it. Let\u0026rsquo;s say if we are building an application that relies on the ffmpeg binaries. That would be hard to run on AWS lambda because those binaries are not just readily available on the OS being used to run underneath powering AWS lambda. So, I\u0026rsquo;m not exactly too sure if this same limitation would affect Google Cloud Functions as well.\nIf one looks at how its solved, you can look no further that the serverless tool. The website is available here:\nhttps://serverless.com/\nIn order to resolve the problem of getting python dependencies in, the serverless tool would need to spin a docker container that would build up those dependencies (if needed). It would then zip it up and fly it over to the S3 bucket which would then be used to deploy the AWS Lambda function.\nLuckily, there is no such need to do all that contorted mess in Google Cloud Functions. It was able to install particularly difficult libraries e.g. pandas with no significant issue (This was hard when I was trying it with AWS Lambda)\nAlongside the python file in the main.py above, just add a requirements.txt and try it out.\nnumpy pandas import pandas as pd import numpy as np def hello_get(request): \u0026#34;\u0026#34;\u0026#34;HTTP Cloud Function. Args: request (flask.Request): The request object. Returns: The response text, or any set of values that can be turned into a Response object using `make_response` \u0026lt;http://flask.pocoo.org/docs/0.12/api/#flask.Flask.make_response\u0026gt;. \u0026#34;\u0026#34;\u0026#34; s = pd.Series([1,3,5,np.nan,6,8]) return \u0026#39;Hello, World!\u0026#39; It should be easy to import and run with no issues\nDeploying GCF Python app via Google Cloud Repositories # There are various ways to deploy a Google Cloud Function. At the moment, one can just type the code straight into its editor, or put a zip file either into Google Cloud Functions directly or via Google Cloud Storage. At the last alternative way would be to set it up with Google Cloud Source Repositories.\nThe Google Cloud Source Repositories is an interesting approach. Rather than having to zip up files and folders and ship it into S3 etc, one can just point the Google Cloud Function to consume it from the repo directly. The nice bit is that one can easily set up Google Cloud Source Repositories to mirror off more traditional places of hosting the codebase, e.g. Github or Bitbucket. The option allows code to be mirrored over.\nIt is not exactly necessary to have a bunch of pictures to show how to setup mirroring in Google Cloud Source Repositories. The forms in the tool is quite easy and intuitive to understand; one can just click through without going through any documentation to set this workflow up.\nWe can then deploy code from a specific branch, tag and even the folder. It is possible to specify all of such details which makes this a pretty flexible and easy solution. Refer to the link below for more details on this:\nhttps://cloud.google.com/sdk/gcloud/reference/functions/deploy\nSetting up CI/CD pipelines via use of Google Cloud Builder # Seeing that it is possible to just use the gcloud cli tool to be able to deploy the solution, this would mean that we can replicate that same effort via using the Google Cloud Builder tool.\nThe Google Cloud Builder is kind of Google\u0026rsquo;s answer to build systems at scale. Just think of it simply of how a company would evolve when they are using their build systems:\nDeveloper starts of with using Jenkins as it is standard build tool in the industry. As time goes by, more builds are needed on Jenkins. It is essentially to have Jenkins work in a master and slave configuration, where the master would allocate build jobs to the slaves which wouold build the apps for deployment Too many configurations, libraries, and junk put into Jenkins; build system evolve to utilize docker to build docker containers in order to encapsulte the different app and its dependencies from each other. Google Cloud Build is kind of the last step; a scalable build solution which is managed by the platform. One would need to use a cloudbuild.yaml file in order to specify the different steps needed to build the applications which can then be sent to the target platform.\nFor example, for Google Cloud Functions, the following configuration is helpful:\nsteps: - name: \u0026#34;gcr.io/cloud-builders/gcloud\u0026#34; args: [ \u0026#34;beta\u0026#34;, \u0026#34;functions\u0026#34;, \u0026#34;deploy\u0026#34;, \u0026#34;{function name}\u0026#34;, \u0026#34;--region=asia-northeast1\u0026#34;, \u0026#34;--source=https://source.developers.google.com/projects/{projectid}/repos/{repo name}/moveable-aliases/{branch name}/paths/{path name}\u0026#34;, \u0026#34;--trigger-http\u0026#34;, Some of the weird things while setting up CI/CD with Google Cloud Build:\nIf command is called without using region: It would redeploy but to a different region (So its necessary to specify this here). The assumption here is that it is using some sort of default region. If command is called without source, it would redeploy but the source repo would not change. It just seem to redeploy the same copy of the codebase The general assumption here is that the params specified here needs to be used such that if you were to do an initial deploy. There is no sense of \u0026ldquo;previous state\u0026rdquo; of the application being deployed before. Permissions is big pain point here - no all permissions required are mentioned in the documentation. To get it working, the minimum set of permissions needed are: Cloud Build Service Account Cloud Function Developer Cloud Function Service Agent (Continuing on permission) This is on the assumption that the we are deploying Google Cloud Functions via usage of the source repositories in Google Source Repositories. If we are to do it by sending a zip over Google Cloud Storage, it might be nceessary to see if we need to add permissions to read and write to Google Cloud Storage here. Integration with Slack Slash commands # So, we have a working http api that we can curl with. How can we make it really accessible anytime. One way would be to link it up with Slack. With Slack, there is an interesting capability to have slash commands which would then allow it to be integrate with other external APIs. The Slack slash command would call a post request to hit against the API specified with a form body request. The form body request would contain all kinds of information including which channel the slack command is called from etc\nAs usual before we get started, we need to handle permissions; so go to the following url: https://api.slack.com/apps. After which, activate the following features:\nIncoming webhooks Slash commands Permissions (Some of the features will be auto-turned on when the feature is activated) Access information about user\u0026rsquo;s public channels Send messages as bot Send messages as service Post to specific channels Upload and modify files Add Slash commands Once we have that, we would be able to interact with Slack\u0026rsquo;s API.\nThe following is a simple python function that sends a message to a channel on Slack\ndef send_text_to_channel(slack_token, slack_channel_id, text): upload_url = \u0026#34;https://slack.com/api/chat.postMessage\u0026#34; data = {\u0026#34;token\u0026#34;: slack_token, \u0026#34;channel\u0026#34;: slack_channel_id, \u0026#34;text\u0026#34;: text} response = requests.post(upload_url, params=data) if response.status_code != 200: raise Exception(json.dumps({\u0026#34;error\u0026#34;: \u0026#34;Unable to send text\u0026#34;})) One can potentially just rely on external 3rd party slack library but seeing that we are only going to use a subset of features, it wouldn\u0026rsquo;t make too much sense to hunt for a good library to use Slack\nGetting the full picture from more complete code # To get a fuller picture of how the whole thing works, the full source code on this is available publically here: https://github.com/hairizuanbinnoorazman/meetup-stats\nList of links for TLDR # If the article above is too long to read, this section would provide the whole list of links to get started with using Google Cloud Functions and its family of tools to create a Slack slash command that can pull meetup stats on a Slack channel.\nGoogle Cloud Documentation Links # List of free simple Google Platform items that can be used (includes quota available etc)\nhttps://cloud.google.com/free/ Simple Python Application on Google Cloud Functions tool\nhttps://cloud.google.com/functions/docs/tutorials/http Google Cloud Functions Pricing\nhttps://cloud.google.com/functions/pricing Deploying a Google Cloud Functions via gcloud CLI\nhttps://cloud.google.com/sdk/gcloud/reference/functions/deploy Google Cloud Builder Documentation\nhttps://cloud.google.com/cloud-build/docs/ Slack Documentation Links # Slack API\nhttps://api.slack.com/apps Slack Slash Commands Documentation\nhttps://api.slack.com/slash-commands Slack Incoming Webhook Documentation https://api.slack.com/incoming-webhooks Other Links # Github Repository to the working code for this\nhttps://github.com/hairizuanbinnoorazman/meetup-stats Link to a summary of some videos from Google Cloud Next (non-exhaustive) ","date":"24 August 2018","externalUrl":null,"permalink":"/getting-meetup-stats-with-google-cloud-functions/","section":"Posts","summary":"Meetup.com is a pretty nice site to setup meetups and sharings on technologies. The platform is pretty nice and easy to use when it comes to bookings but sometimes, the data provided by its web interface is not sufficient nor does it fit our use case. In this case, let’s say you are trying to understand the trend of the number of people attending a meetup. To an organizer, an important thing to him/her is to understand what kind of actions would lead to higher turnups/registrations for a meetup. So, by the end of this post, hopefully we would be able to have a pretty decently priced (free if possible) solution for an analytics solution which would only be called occasionally.\n","title":"Getting Meetup Stats with Google Cloud Functions","type":"posts"},{"content":"When developing application that are meant to be deployed to the Kubernetes platform, it involves a bunch of steps on top of your usual local development work:\nWriting a Dockerfile to package the application (Multi stage applications are optional here - useful for compiled based languages) Build and tagging the docker image of the application with the target repository Either use kubectl commands or use kubernetes config resource files to define the resources required for deploying the applications. Use those commands/configurations to define the resources on the staging/production application Repeat the process for each update of the application (Repeat second point onwards) As you see from above, it starts to be pain to do so after each iteration of the application development. The building of the docker containers process as well as the applying of the new images to each cluster, (sometimes with slightly changed configuration files) - the kubernetes secret and config files can change across different environments.\nOne can choose to use bash scripts to handle the issue but there is another potential tool that can be used for this: skaffold\nThe skaffold tool can be found here:\nhttps://github.com/GoogleContainerTools/skaffold\nThis tool is now way easier to use especially since Docker kind of packages Kubernetes along with Docker (It\u0026rsquo;s an optional installation but still it way easier as compared to find tools out there in the market and getting them running etc)\nThe tool provides several interesting features that I kind of want to highlight:\nHot reloading of the application. On save of your application code, the skaffold dev command will rebuild your application. It will only monitor changes there were specified within the skaffold.yaml file; this would be directories of your Docker context as well as the kubernetes manifest files Allowing multiple profiles to be stored for use. This allow one to switch between the different types of deployments from the same command line. Like I can easily be developing locally and once I\u0026rsquo;m happy, I can run something like skaffold run -p stag to set the required images into staging environment or even qa environment etc. In the case where locally, I need to play around with only mocked services, I could then easily send my code over to a cluster that is more relatively more \u0026ldquo;setup\u0026rdquo; with additional services to properly test the code. Switch tools being used for building the docker containers. Most of the time, I could use my local computer to build the application but sometimes, it would be nice to kind of rely on an external build system that would kind of build the application for me. This would matter more if the application relies on a lot of packages and I would need a clean build (with zero cache usage); it would help for the container builder to be in an environment where the packages can be downloaded at high speeds. (e.g. on cloud infrastructure) Here is a sample working application and skaffold configuration for local environment:\nhttps://github.com/hairizuanbinnoorazman/kubeapps/tree/master/basicSkaffold\n","date":"9 August 2018","externalUrl":null,"permalink":"/trying-out-skaffold/","section":"Posts","summary":"When developing application that are meant to be deployed to the Kubernetes platform, it involves a bunch of steps on top of your usual local development work:\nWriting a Dockerfile to package the application (Multi stage applications are optional here - useful for compiled based languages) Build and tagging the docker image of the application with the target repository Either use kubectl commands or use kubernetes config resource files to define the resources required for deploying the applications. Use those commands/configurations to define the resources on the staging/production application Repeat the process for each update of the application (Repeat second point onwards) As you see from above, it starts to be pain to do so after each iteration of the application development. The building of the docker containers process as well as the applying of the new images to each cluster, (sometimes with slightly changed configuration files) - the kubernetes secret and config files can change across different environments.\n","title":"Trying out skaffold","type":"posts"},{"content":"","date":"2 August 2018","externalUrl":null,"permalink":"/categories/mermaid/","section":"Article Categories","summary":"","title":"Mermaid","type":"categories"},{"content":"","date":"2 August 2018","externalUrl":null,"permalink":"/tags/mermaid/","section":"Technology Tags","summary":"","title":"Mermaid","type":"tags"},{"content":"There is an interesting Javascript project that allows one to use just plain old text and convert those said text into diagrams.\ngraph TD Start --\u003e Stop Text to that converts to the above diagram:\ngraph TD Start --\u0026gt; Stop Mermaid Documentation Page\nhttps://mermaidjs.github.io/\nMermaid js source code\nhttps://github.com/knsv/mermaid\nLet\u0026rsquo;s say we would want to use this diagram conversion tool in conjuction with hugo; how should we do it? Do we somehow need to embed html partial snippets all over the place etc?\nLuckily, Hugo has a mechanism to do this via shortcodes. Links to the website is below\nHugo Shortcodes: https://gohugo.io/content-management/shortcodes/\nEssentially, the shortcut somehow injects html into the rendered html page from the markdown file. Since its plain old html, we can inject all kinds of html templates, html templates with javascript. That\u0026rsquo;s essentially how one would be able to embed the youtube video player into blog posts.\n{{ youtube XXXXXX }} Pretend the XXXXXX is some sort of youtube id (It is the v parameter of the youtube url.)\nHugo supports several shortcodes, including instagram and twitter.\nHowever, let\u0026rsquo;s go back to using Mermaid with Hugo. Hugo does not support Mermaid.js out of the box, one cannot just use shortcode to inject snippets of mermaid js html all over the blog. However, there is a mechanism to build it (also available in the Hugo website). However, if one wants some reference on how to add such functionality, they can look at the following example.\nMermaidjs + Hugo example\nhttps://github.com/matcornic/hugo-theme-learn\nThis website relies on Hugo and Mermaidjs which would generate the pages with diagrams svgs on them as ncessary.\nThere are a few things we need to take note while building the shortcode to support it:\nDo not have the mermaid.js be imported to the site via a CDN. When running locally, the browser will block all external scripts from running locally. There are some breaking changes for mermaid.js - take note of them Hugo website kind of mentioned that one can put js and css files into the static folder. If one comes from the frontend development work, it becomes easy to assume that the js and css files would be compiled and imported as part of local. This is definitely not true in this case; there is a need to actually have a html snippet that actually does the importing of the required files to the frontend. Steps Involved: Add css and js files required for mermaid.js library in the static folder. Have the mermaid.html shortcode be put into the shortcode folder of layouts Have a partial HTML snippet in partials to be able to import the html into the final rendered HTML output. The partial HTML snippet should be importing the css and js locally. There is a need to activate for mermaid.js scripts for the script to know when to activate and convert the text to diagram. The result would appear as below which would allow you to render the graph above.\n{\u0026amp;lbrace;\u0026lt;mermaid\u0026gt;\u0026amp;rbrace;} graph TD Start --\u0026gt; Stop {\u0026amp;lbrace;\u0026lt;/mermaid\u0026gt;\u0026amp;rbrace;} Just a little fun experiment to see how far this Hugo framework can take me.\n","date":"2 August 2018","externalUrl":null,"permalink":"/rendering-diagrams-in-hugo/","section":"Posts","summary":"There is an interesting Javascript project that allows one to use just plain old text and convert those said text into diagrams.\ngraph TD Start --\u003e Stop Text to that converts to the above diagram:\n","title":"Rendering diagrams in Hugo","type":"posts"},{"content":"","date":"29 July 2018","externalUrl":null,"permalink":"/tags/conference/","section":"Technology Tags","summary":"","title":"Conference","type":"tags"},{"content":"Google recently had a Google Cloud conference where they feature all the exciting new technologies that are made available in Google Cloud. There are numerous product announcements that were mentioned in the various keynotes as well as the breakout sessions; so in an attempt to understand what\u0026rsquo;s exactly is happening in terms of major product releases, I did a textual overview of the videos with links to additional resources. This is to highlight some of the brand new google products and how they can be used to serve new business needs and capabilities.\nFor the full list of videos of all the recorded sessions, refer to the link here:\nhttps://www.youtube.com/playlist?list=PLBgogxgQVM9v0xG0QTFQ5PTbNrj8uGSS-\nDay 1 Keynote Day 2 Keynote Day 3 Keynote Accelerating Your Kubernetes Development with Kubernetes Applications Cloud Functions Overview: Get Started Building Serverless Applications CI/CD for Hybrid and Multi-Cloud Customers Take Control of your Multi-cluster, Multi-Tenant Kubernetes Workloads What\u0026rsquo;s Next for G Suite: Our Areas of Investment and Upcoming Releases Day 1 Keynote # Video Link: https://www.youtube.com/watch?v=vJ9OaAqfxo4 Kubernetes Service Monitoring Istio General Availability Coming! GKE On-Prem Cloud Services Platform Managed Istio - General Availabilty Filestore Cloud Build GSuite Enterprise Auto Machine Learning Contact Center AI Day 2 Keynote # Video Link: https://www.youtube.com/watch?v=XiGBWpxc6Lc Big Query Machine Learning Binary Authorization Google Cloud Functions General Availability Google Kubernetes Serverless add-on: KNative Cloud Armour Google Map Revamp Routes Places Ridesharing Asset Tracking Cloud IoT Cloud IoT Core Edge TPU Cloud IoT Edge Day 3 Keynote # Video Link: Cloud Source Repositories Cloud Build Github would recommend cloud build as another possible CI tooling Contains security profiling checks Profiler Trace Spinnaker opencensus dialogflow Go Cloud Project Cloud Firestore Firebase AB Testing Firebase Predictions ML Kit for Firebase Cloud AI Adventures Kaggle Kaggle Competitions Kaggle Learn Kaggle Kernels Kaggle Datasets Kaggle Deep Integration into GCP -\u0026gt; ML Models created could be passed into AutoML Unity \u0026amp; GCP on Connected Games Kubevirt mentioned during the tech panel Kubeflow mentioned during the tech panel gVisor mentioned during the tech panel Accelerating Your Kubernetes Development with Kubernetes Applications # Video Link: https://www.youtube.com/watch?v=C6koWw0r07Y https://github.com/kubernetes-sigs/application Steps when running applications Status Installation Progress Resource Activity Connecting Site/Admin URL Service Endpoint Credentials Client Commands Operations User Guide Upgrade Backup/Restore Deletion Beforehand, one has to hunt and manage those kubernetes resources to run an \u0026ldquo;app\u0026rdquo; manually; e.g. Finding and peeking at the secrets etc. Application resource was kind of introduced to encapsulate and provide some sort of metadata to understand the whole scenario betters Helm was an alternative choice, however, the resources are still represented as pods and services and other native Kubernetes resources. Hence, if one needs to debug/operate the whole \u0026ldquo;application\u0026rdquo;, there is some digging around that needs to be done to get going. Resources Pods Persistant Volumes, Persistant Volume Claims Services Deployments, Replica Sets Statefulsets Daemonsets Configmaps, Secrets Ingress Example of a kubernetes application highlighted in the Google Cloud Console. Look on the portion on the right on how the application section is highlighted to give more context about the application. Cloud Functions Overview: Get Started Building Serverless Applications # General Overview Video Link: https://www.youtube.com/watch?v=JenJQ6gc14U Details Video Link: https://www.youtube.com/watch?v=Y1sRy0Q2qig Cloud Scheduler available Google Cloud Functions in General Availability for Node 6 Environment NodeJS 8 and Python 3.7 runtimes coming soon Cloud Storage Events Ubuntu 18.04 Base Image which include the following libraries libcairo imagemagick ffmpeg headless browsers Cloud SQL Direct Connect Scaling Controls CI/CD for Hybrid and Multi-Cloud Customers # Video Link: https://www.youtube.com/watch?v=IUKCbq1WNWc Add support 1st class artifacts, binaries and tarballs Filepaths triggers Github PR Support, Checks and API Result Google Cloud Build UI Dashboard Google Cloud Build Workers In the case of the repo on-prem Have the Google Cloud Build Workers to utilize a set of your own pool of Google Cloud Compute instances in your own vpc Take Control of your Multi-cluster, Multi-Tenant Kubernetes Workloads # Video Link: https://www.youtube.com/watch?v=LysDry8xpt4 Evolution of how kubernetes being used One Cluster per Tenant Multicluster per tenant (Serving multiple regions) + Namespace (Where policy controls etc can be added to control and standardize access) Multitenant and multicluster developement GKE Policy Management Centrally defined policies through all clusters - easier to manage Namespaces are the tenants (Granularity to go for) GKE Clusters namespaces are flat based but most organizations are hierarchy in nature Consists of the Policy Importer (Get the policies from various sources, now only on git and Google Cloud GUI), Syncer (Realize changes back on the cluster), Quota Controller (Allow controlling of quota on a group level) What\u0026rsquo;s Next for G Suite: Our Areas of Investment and Upcoming Releases # Video Link: https://www.youtube.com/watch?v=AvEOxA8Y6Tc Security Investigation Tool Data Regions Titan Security Keys (Physical keys) Gmail Updates Native Offline Support Compose action add-ons Confidential Mode (Expiration date, cannot download attachment etc) Smart Compose in Gmail Calendar Updates Automatic Room Release Meeting Room Insights Calendar Interop (Work between multiple calendars outside of Google) Docs Grammar Correction Sheets Sheets + BigQuery Data Connector. gsuite.google.com/bq-sheets SAP Integration with Sheets Salesforce Integration with Sheets Sheets embedding in salesforce Sheets Macros (Record how to format and alter data) Explore Tool in sheets (Ask natural language question to retrieve insights about the data) Formula Accelerator (Formula Suggestions) New Charting Improvements (Slicers - filter UI, Scorecard chart) Box, Dropbox, Egnyte integrations Meeting Solutions Hangouts Meet Meet Hardware Kit Jamboard Live streaming in Hangouts Meet Assistive voice commands with the meet hardware kit Adaptive layout for the hangout meet Interoperability with Hangouts Meet (partner: pexip) Virtual whiteboarding in every meeting with Jamboard Jamboard autodraw (Sketch something quick and jamboard try to convert it to proper images) Hangouts chat Enterprice Content Management Real-time presence in Microsoft Office (Google drive interop with Word doc - inform you when to edit or when someone else is editing the change) Metadata in Drive Approvals in Drive Priority page in Drive Priority Page: Suggested Feed Priority Page: Workspaces Pincode Sharing (Sharing docs between non Google user) ","date":"29 July 2018","externalUrl":null,"permalink":"/summary-of-google-cloud-next-2018/","section":"Posts","summary":"Google recently had a Google Cloud conference where they feature all the exciting new technologies that are made available in Google Cloud. There are numerous product announcements that were mentioned in the various keynotes as well as the breakout sessions; so in an attempt to understand what’s exactly is happening in terms of major product releases, I did a textual overview of the videos with links to additional resources. This is to highlight some of the brand new google products and how they can be used to serve new business needs and capabilities.\n","title":"Summary of Google Cloud Next 2018","type":"posts"},{"content":"This is not a in depth summary of the talks in Google IO Extended 2018. Rather, it is my notes from attending to the conference, which are heavier on links to find more about the topics. There are 2 days to the IO Extended 2018 event in Singapore. The list of talks below are from the second day.\nList of talks\nGoing to gRPC from REST Machine Learning - 101 Web Things Android Things Pivotal Cloud Foundry on GCP Going to gRPC from REST # Some resource links:\nhttps://grpc.io https://grpc.io/community/ https://github.com/grpc https://github.com/grpc-ecosystem Multiple middlewares Integrations with monitoring (e.g. Prometheus) Machine Learning - 101 # Some resource links:\nStandford\u0026rsquo;s CS229 by Prof Andrew Y. Ng\nhttps://www.youtube.com/watch?v=UzxYlbK2c7E Tensorflow tutorial\nhttps://developers.google.com/machine-learning/crash-course Shogun Toolbox\nhttp://www.shogun-toolbox.org/page/features http://daoudclarke.github.io/machine-learning-practice.html http://www-bcf.usc.edu/~gareth/ISL/ http://stanford.edu/~cpiech/cs221/handouts/kmeans.html https://ipython-books.github.io/81-getting-started-with-scikit-learn/ Web Things # AMP Packager\nhttps://github.com/ampproject/amppackager AMP Stories\nhttps://www.ampproject.org/stories/ AMP Stories Components\nhttps://www.ampproject.org/docs/reference/components/amp-story Experimental: chrome://flags/#enable-desktop-pwas WebXr\nhttps://webxr.io/webar-playground/app/ Some resource links:\nhttps://www.ampproject.org/ https://codelabs.developers.google.com https://developers.google.com https://github.com/immersive-web Android Things # Some resource links:\nhttps://g.co/iotdev https://iot.google.com https://developer.android.com/things Pivotal Cloud Foundry on GCP # Some resource links:\nhttps://pivotal.io/platform https://pivotal.io/platform/pivotal-application-service ","date":"16 July 2018","externalUrl":null,"permalink":"/summaries-from-google-io-extended-2018-singapore-day-2/","section":"Posts","summary":"This is not a in depth summary of the talks in Google IO Extended 2018. Rather, it is my notes from attending to the conference, which are heavier on links to find more about the topics. There are 2 days to the IO Extended 2018 event in Singapore. The list of talks below are from the second day.\n","title":"Summaries from Google IO Extended 2018 Singapore - Day 2","type":"posts"},{"content":"This is not a in depth summary of the talks in Google IO Extended 2018. Rather, it is my notes from attending to the conference, which are heavier on links to find more about the topics. There are 2 days to the IO Extended 2018 event in Singapore. The list of talks below are from the first day.\nList of talks\nIO 2018 Highlights Profiling and Android Vitals Web Presence Chatbots Android Jetpack Exoplayer Customization Kotlin and Java IO 2018 Highlights # Some of the newer developments highlighted here:\nhttps://cloud.spring.io/spring-cloud-gcp/ https://developers.google.com/web/tools/lighthouse/ iOS was mentioned!! https://firebase.google.com/docs/test-lab/ https://developers.google.com/actions/community/overview https://www.tensorflow.org/hub/ https://js.tensorflow.org https://developers.google.com/machine…/crash-course/ml-intro Profiling and Android Vitals # A talk about managing performance in an Android Application. Due to the requirements of having more performant applications by users, there is a need to understand the performance of every aspect of the application. E.g. Network usage, battery consumption of certain sections of the application etc.\nLinks:\nhttps://developer.android.com/topic/performance/vitals/ https://medium.com/@RenuYadav/android-vitals-an-initiative-for-good-health-12469a06fdb9 Web Presence # Links for resources from this session:\nhttps://developers.google.com/search/mobile-sites/mobile-first-indexing Chatbots # There are two chatbot talks in this segment:\nIO Extended 2018 extended chatbot Eddy the eagle chatbot The main technology powering the chatbots is this: DialogFlow\nWhen a chatbot receives a text from a user, it needs to sent it to a \u0026ldquo;server\u0026rdquo; for processing. One of the cheaper ways to handle these are via the cloud functions (serverless option). After doing the initial processing, the text can be sent over to dialogflow which would then retrieve and categorize what intent does that mean. The intent values are returned to the serverless function which would then respond to the user accordingly.\nIO Extended 2018 chatbot mainly revolves around only dialogflow and firebase cloud functions. However, the Eddy the eagle chatbot shows how chatbot can truly be useful to everyday life. The Eddy the eagle chatbot aim is to be able to allow students at a school to quickly look up lists of homework or lesson schedules rather than going through a bunch of links just to retrieve the information they need.\nLinks to additional resources:\nhttps://dialogflow.com/ https://firebase.google.com/docs/functions/ https://www.slideshare.net/SohitGatiganti/eddy-the-eagle-the-student-chatbot-104725786 https://github.com/sohit39/SAS_Chatbot https://github.com/yogendra/io-ext-sg-2018 Android Jetpack # Links for this session:\nhttps://android.jlelse.eu/introduction-to-android-architecture-components-with-kotlin-room-livedata-1839c17597e https://github.com/googlesamples/android-sunflower https://github.com/googlesamples/android-UniversalMusicPlayer https://github.com/googlesamples/android-architecture-components https://codelabs.developers.google.com/?cat=Android https://developer.android.com/topic/libraries/architecture/ Exoplayer Customization # On android, there is a media player object that can be used to play videos. However, it is quite inflexible, and it is difficult to use when it comes to managing and handling video playing at scale.\nLinks to some of the resources out there\nhttps://github.com/google/ExoPlayer https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_over_HTTP Kotlin and Java # Getting kotlin and java to play nice while developing an android application\nAdditonal References:\nhttps://developer.android.com/kotlin/ktx https://github.com/android/android-ktx ","date":"11 July 2018","externalUrl":null,"permalink":"/summaries-from-google-io-extended-2018-singapore-day-1/","section":"Posts","summary":"This is not a in depth summary of the talks in Google IO Extended 2018. Rather, it is my notes from attending to the conference, which are heavier on links to find more about the topics. There are 2 days to the IO Extended 2018 event in Singapore. The list of talks below are from the first day.\n","title":"Summaries from Google IO Extended 2018 Singapore - Day 1","type":"posts"},{"content":"The following set of summaries are from the Kubecon and Cloud Native Con Europe in Denmark from 2-4 May 2018.\nThese summaries are from conference talks that I thought provided more interesting thinking points.\nThe videos for the conference can be found here:\nhttps://www.youtube.com/watch?v=OUYTNywPk-s\u0026list=PLj6h78yzYM2N8GdbjmhVU65KYm_68qBmo\nBelow are some of the talks that I found quite interesting (just my own preference)\nI took some of my personal notes so that I don\u0026rsquo;t need to rewatch the videos once more just to get the main point the video seem to talk about.\nAnatomy of a Production Kubernetes Outage Cloud Native Landscape Intro Accelerating Kubernetes Native Applications Kubernetes Project Update The Challenges of Migrating 150+ microservices Container-Native dev and ops experience Container Native observability \u0026amp; security from Google Cloud Continuously Deliver your Kubernetes Infrastructure Anatomy of a Production Kubernetes Outage # Production Outage occured Blog Post: https://community.monzo.com/t/resolved-current-account-payments-may-fail-major-outage-27-10-2017/26296/95?u=alexs Another blog post: https://community.monzo.com/t/anatomy-of-a-production-kubernetes-outage-presentation/37331 In summary: Checking for compatability between platform, tools are vital - such checks are vital especially on the platform level when they can cause cascading failures across the applications. Fallbacks when systems fail is helpful; in the case above, applications failed but transactions continue running. Cloud Native Landscape Intro # Introduction to the cloud native landscape tools and github page Github Link: https://github.com/cncf/landscape Website Link: https://landscape.cncf.io/ Get the pdf versions of the landscape from Github Accelerating Kubernetes Native Applications # Operators is a concept that was build on Kubernetes providing the Custom Resource Definitions Allows for specific application management; e.g. Managing the running of a database - if a database need to be resized, operators could be programmed to trigger snapshot before switching to a bigger pod which the data can be replicated in. (example only) Reasons on why operators are kind of game changing: https://dzone.com/articles/why-kubernetes-operators-are-a-game-changer Additional links: https://medium.com/@mtreacher/writing-a-kubernetes-operator-a9b86f19bfb9 Operator Framework by core os: https://coreos.com/operators/ Github link to operators: https://github.com/operator-framework/operator-sdk Kubernetes Project Update # Security Network Policy Encrypted Secrets RBAC TLS Cert Rotation Pod Security Policy Threat Detection (Not really part of Kubernetes - GKE Cloud Security Command Centre) Sandbox Applications (Providing a tiny kernel for the container - gVisor) Applications Batch Applications Workload Controllers, Local Storage GPU access Container Storage Interface (Mention about a Spark operator - a software which manages the running of a Spark cluster) Stackdriver. Integrates deeply with Prometheus Developer Experience Skaffold (Allows debug tool to be attached allowing interactive debugging with custom deployments) The Challenges of Migrating 150+ microservices # Tools out there kind of follow the same cycle: Genesis -\u0026gt; Custom Built solutions -\u0026gt; Product Offering -\u0026gt; Commodity. Chart from here: https://medium.com/wardleymaps/anticipation-89692e9b0ced Link to whole blog post: https://medium.com/wardleymaps When companies are big, moving and innovating becomes expensive (its not a technology problem but more of a human, community, company problem). So essentially, one can consider this as innovation tokens; tokens that should only be spent wisely, else failure would be result. Choose boring technology. http://mcfunley.com/choose-boring-technology One way to reduce risk is to run the applications on 2 parallel stacks but it is very expensive in terms of complexity and human effort. When doing this, one needs note of the costs of doing this kind of test Such tests have an impact on cost - might be good to rope in the people with this on the test being run, the hypothesis of what that should be happening and the benefits that the company will have Container-Native dev and ops experience # Talk about the following tool: https://github.com/Azure/draft Container Native observability \u0026amp; security from Google Cloud # Talk about the following tool: gVisor - this tool is a fix for the Dirty Cow vulnerability Stackdriver support - Deep prometheus integration - It can import metrics stats over from it to stackdriver to provide the one glass pane to be able to view all applications being monitored in one tool Podcast: https://kubernetespodcast.com/ Blog post talking about podcast: https://cloudplatform.googleblog.com/2018/05/introducing-kubernetes-podcast-from-google.html Continuously Deliver your Kubernetes Infrastructure # Philosophy for setting kubernetes clusters No pet clusters (No special custom configuration for 80 clusters) Always provide the latest stable Kubernetes version Continuous and non-disruptive cluster updates \u0026ldquo;Fully\u0026rdquo; automated operations (Able to redeploy by just doing PRs) Cluster setup Provision in AWS via cloud formation Etcd stack outside Kubernetes Container Linux Multi-AZ worker nodes HA control plane setup behind ELB Cluster configuration in git e2e test on Jenkins Cluster registry List of clusters available of access https://github.com/zalando-incubator/kubernetes-on-aws https://github.com/zalando-incubator/cluster-lifecycle-manager Multiple \u0026ldquo;channels\u0026rdquo; of Kubernetes Cluster upgrade moves from dev, alpha, beta clusters dev (Cluster to play around with) alpha (Main infrastructure cluster that is used by infrastructure team for testing) beta (Main cluster rest of org uses) Has e2e tests Conformance tests (https://github.com/cncf/k8s-conformance) Statefulset tests (Test attachment volumes - testing to use redis cluster?) Has monitoring on each cluster to ensure behaviour https://github.com/mikkeloscar/kubernetes-e2e Hints for running e2e tests Run with flake attempts=2. Some tests can fail due to autoscaling Update e2e images with each release of Kubernetes Disable broken e2e tests with -skip parameter Remove completed pods from kube-system to make room for other pods of testing to enter (To save money) ","date":"16 May 2018","externalUrl":null,"permalink":"/lessons-from-kubecon/cloudnativecon-2018-europe/","section":"Posts","summary":"The following set of summaries are from the Kubecon and Cloud Native Con Europe in Denmark from 2-4 May 2018.\nThese summaries are from conference talks that I thought provided more interesting thinking points.\n","title":"Lessons from Kubecon/CloudNativeCon 2018 Europe","type":"posts"},{"content":"This is the list of talks provided in the reccent Gophercon Conference held in Singapore on 4th May 2018\nGo with Versions Project-driven journey to learning Go Resilency in Distributed Systems Understanding Running Go Program Go for Grab Optimize for Correctness Build your own distributed database The Scandalous Sotry of Dreadful Code Written by the Best of Us Erlang for Go developers Go and the future of offices Reflections on Trusting Trust for Go The lost art of bondage Below are some of the more interesting points raised during the talk (View the full talk to understand the context on what and why a certain point was raised.)\nThe list of videos from Gophercon can be found here:\nhttps://engineers.sg/conference/gopherconsg-2018\nGo with Versions # Versioning in Golang has always been lacking Golang community combined their efforts together to create a tool called Dep which is a package management tool which implements the usual package management that other languages like Ruby and Python have (bundler and pip respectively). Includes some sort of config file; Gopkg.toml file as well as as a lock file Gopkg.lock file. Several use cases of how using Dep can result in fissures in package management ecosystem due to the fact that the tools, when asked to upgrade, kind of takes up the latest version of the package; even if the latest result in breakages etc - needed some locks to prevent it from taking up bad versions Refer to the following commit for the full discusion on the vgo proposal:\nhttps://github.com/golang/go/issues/24301 Resiliency in Distributed Systems # Microservice talk by Go-Jek Fault vs Failures: A fault is a state where system is unhealthy but is still working; a failure would mean that users would not be able to interact with the system. Fault could happen from: Database slowdown Memory leaks Blocked threads Dependency Failure Bad Data coming in/going through the system https://github.com/gojektech/heimdall The capability of a system of preventing faults turn to failures is called resilency Ways to handle it: Timeouts (Never ever wait for a client/server forever) Retries (System that can eventually recover - don\u0026rsquo;t need intervention to manually retry stuff) Circuit Breakers (Prevent stampeding herd all over system) Fallbacks (E.g. Does the service really need to be up; can a alternative be served in the mean time - third party integrations can have their alternatives be served when the primary integration failed.) Resilency Testing (Using chaos monkeys systems etc to do a test run to see what happens if stuff happened to the system) Rate limit/throttling (Prevent stampeding herd situation where failures in parts of the system don\u0026rsquo;t cascade over to other parts of the system) Bulk heading Queueing (Queue slows down the system - reduce stress on the systems where it is not needed for fast and immediate responses) Monitoring/Alertings Canary releases (Release new versions of the software slowly - release to a small percentage and see if errors spike etc; if not, release to a bigger and bigger group until it becomes the version that is the majority of the system) Redundancies Go for Grab # Internally, they have a toolkit called Grabkit which they used for bootstrapping their microservice applications. The toolkit was inspired by Gokit Talks on microservices and how the importance distributed applications became problems that had to be solved on a company wide level Interesting point raised: Make your functions accept context: you\u0026rsquo;ll be glad you did Difficulty of doing debugging and root cause analysis - central logging systems as well as good monitoring and alerting systems would be helpful Optimize for Correctness # Article that was used in the presentation: https://github.com/ardanlabs/gotraining/blob/master/topics/go/README.md More code == more bugs; lesser code is better Every decision made comes at a cost, more abstractions might result in more complexity making it difficult to predict the performance of the code etc The Lost Art of Bondage # Some C applications are just too expensive to be ported over to Golang; instead, bindings are introduced. Golang has a library called cGo which would interface with such C code. Examples of c code interfaced with that was brought up during the talk is Cuda ","date":"9 May 2018","externalUrl":null,"permalink":"/lessons-from-gophercon-sg/","section":"Posts","summary":"This is the list of talks provided in the reccent Gophercon Conference held in Singapore on 4th May 2018\nGo with Versions Project-driven journey to learning Go Resilency in Distributed Systems Understanding Running Go Program Go for Grab Optimize for Correctness Build your own distributed database The Scandalous Sotry of Dreadful Code Written by the Best of Us Erlang for Go developers Go and the future of offices Reflections on Trusting Trust for Go The lost art of bondage Below are some of the more interesting points raised during the talk (View the full talk to understand the context on what and why a certain point was raised.)\n","title":"Lessons from Gophercon SG","type":"posts"},{"content":"View the full list of Fossasia Video recordings on the [engineers.sg] website on this website: https://engineers.sg/conference/fossasia-2018\nHere are some of the videos I particularly liked. I summarized some of the interesting points from said websites.\nTopic: Everything as code https://engineers.sg/video/everything-as-code-fossasia-2018--2409 Instead of using GUI tooling to define architecture, have everything be specified in code instead. This opens a whole bunch of ways to actually work with it. Some of the benefits would include being able to utilzie the full set of tools that programmers to work on a set of code (git + repositories + versioning etc). Another benefit would be the portion where it becomes possible to have repeatable builds - it is hard to replicate same actions on the GUI interface of infrastruture platforms. ","date":"2 May 2018","externalUrl":null,"permalink":"/summaries-from-fossasia-conference-in-singapore/","section":"Posts","summary":"View the full list of Fossasia Video recordings on the [engineers.sg] website on this website: https://engineers.sg/conference/fossasia-2018\nHere are some of the videos I particularly liked. I summarized some of the interesting points from said websites.\n","title":"Summaries from Fossasia conference in Singapore","type":"posts"},{"content":"Out of random curiousity and laziness on my part, I decided to create a CLI tool which would allow me to create tasks on task managements websites such as on Asana, and issues in Github/Bitbucket.\nI would be writing it in Golang - seeing that it would allow to be build a CLI tool executable without too much trouble. This would mean less work when it comes to distribution (may need to cross compile if necessary) but for now, will be aiming it for people on macs\nScoping out work # Before starting out on writing out the CLI tool, we would first need to plan out what would be expected of the product:\nIn our case, we should be able to substitute the platform for the different platforms we are targeting against\nAllow one to create tasks on platform with labels and deadline Allow one to list tasks on platform for a specific user Allow one to list tasks for a project Allow one to create tasks on multiple platforms at the same time Allow one to copy tasks across platforms The struct could be described as follows - we can\u0026rsquo;t load up too many fields as it will make it not inter-operable across multiple task management platforms.\ntype Task struct { Name string Description string Label string Deadline time.Time } The first platform to target building would be asana before moving on to other platforms.\nPreviewing the final tool # I would imagine that the tool could look something like this:\n(Name of tool is still up in arms - haven\u0026rsquo;t decided anything concrete yet)\n# Create task tasker create -name=\u0026#34;This is a test task\u0026#34; -desc=\u0026#34;We would need to try building this product properly\u0026#34; -label=\u0026#34;low priority\u0026#34; # List task tasker list -tool=\u0026#34;asana\u0026#34; # List task for a project tasker list -tool=\u0026#34;asana\u0026#34; -proj=\u0026#34;random\u0026#34; # Create task for multiple platforms at the same time tasker create -name=\u0026#34;This is a test task\u0026#34; -desc=\u0026#34;We would need to try building this product properly\u0026#34; -label=\u0026#34;low priority\u0026#34; -tool=\u0026#34;asana,github\u0026#34; # Copy task between platforms # Needs to be amde easier though tasker cp -originTool=\u0026#34;asana\u0026#34; -id=\u0026#34;12\u0026#34; -destTool=\u0026#34;github\u0026#34; Some of the tasks can be made easier by using some sort of config file to be able to control the behaviour of the tool, although there would make it hard to understand what\u0026rsquo;s happening. Some of the things to think about would be:\nSet a primary tool and list the rest of secondary tools Have a setting that would allow one to set whether they would want the issue to be created only on primary tool but not on the secondary tool/vice-versa Actual implementation # We will leave the actual implementation, description of the problem on another date in another blog; look forward to it!!\n","date":"22 April 2018","externalUrl":null,"permalink":"/lets-use-cli-to-create-tasks/","section":"Posts","summary":"Out of random curiousity and laziness on my part, I decided to create a CLI tool which would allow me to create tasks on task managements websites such as on Asana, and issues in Github/Bitbucket.\n","title":"Let's use CLI to create Tasks!!","type":"posts"},{"content":"","date":"15 April 2018","externalUrl":null,"permalink":"/tags/google-analytics/","section":"Technology Tags","summary":"","title":"Google Analytics","type":"tags"},{"content":"","date":"15 April 2018","externalUrl":null,"permalink":"/tags/google-tag-manager/","section":"Technology Tags","summary":"","title":"Google Tag Manager","type":"tags"},{"content":"","date":"15 April 2018","externalUrl":null,"permalink":"/categories/react/","section":"Article Categories","summary":"","title":"React","type":"categories"},{"content":"This is going to be a pretty short post but should prove to be useful if you are already familiar with tool.\nQuick intro of normal website tracking # When one navigates through a normal server rendered website that is utilizing Google Tag Manager or Google Analytics (assuming that is is set up right), as the page loads, it would send a page view hit to the Google Analytics server. This is normal familiar behaviour for most people who used the tools.\nHowever, if the website is a single page application, the whole situation completely changes. If one sets up the tracking tool as usual to track page views and if one debugged the entire scenario, the page hits only get fired on initial load. As the user navigates through the website, no other page view hits get fired to the tool. Reason for this is that the new pages don\u0026rsquo;t technically load a web page - so that does not trigger the tracking hits.\nPreviously, solutions for these included creating virtual page views which is deeply embedded in the application. This kind of requires the development to roughly construct the page view hit which would be fired as a page view although the way its being done; it would be always like firing of tracking hit in response to events.\nGoogle Tag Manager to the rescue? # I\u0026rsquo;m gonna switch to talking about Google Tag Manager now which is the main tool I look to when it comes to website tracking. It is pretty excellent which allows one to embed tracking/advertisement tags without having the developers adding it at the end of projects. I won\u0026rsquo;t go too deep into the structure and concepts of the Google Tag Manager such as Tags, Triggers and Variables here, however, it is necessary to know this to appreciate the rest of the post.\nWithin Google Tag Manager, there is a set of predefined trigger called history change. This is the one we would need to take note as we delve into how React applications handle page changes in their applications. In React Single Page Applications, we would normally use a library called React Router DOM which in turns utilizes the history library which is then handled by browser. Read the below webpage for more information.\nhttps://developer.mozilla.org/en-US/docs/Web/API/History_API\nWith that in mind, all we need to do is to set up a Google Analytics Tag as usual but instead of the normal trigger of firing off on every single page, we would just need to change it to trigger on every history page. This would be sufficient to get all the required page data into Google Analytics tool.\n","date":"15 April 2018","externalUrl":null,"permalink":"/using-google-tag-manager-in-react-web-application/","section":"Posts","summary":"This is going to be a pretty short post but should prove to be useful if you are already familiar with tool.\nQuick intro of normal website tracking # When one navigates through a normal server rendered website that is utilizing Google Tag Manager or Google Analytics (assuming that is is set up right), as the page loads, it would send a page view hit to the Google Analytics server. This is normal familiar behaviour for most people who used the tools.\n","title":"Using Google Tag Manager in React Web Application","type":"posts"},{"content":"","date":"15 April 2018","externalUrl":null,"permalink":"/categories/web/","section":"Article Categories","summary":"","title":"Web","type":"categories"},{"content":"This is an application based on a previous blog post on Bookcase application.\nThe link to the code base of the application:\nhttps://github.com/hairizuanbinnoorazman/golang-web-gin-book-store\nThere is a chance that when you are on the code base, the application is not fully operational; I am still adding code to it to add functionality to the application\nLearning 0: Tools/Libraries # Some of the below tool/libraries would be useful when added the Bookcase application\nViper - Reading Configuration Files Negroni - Middleware Management Sendgrid - Transaction Email JWT Tokens - Auth token library Learning 1: Structuring the application # A few important things to note here is that we would want to structure our app such that the components within the application is extensible; it gives us, the users, an opportunity to switch the different parts of architecture. An example of such a scenario is the usage of the database in the application.\nNormally, models in a MVC sort of architecure has the save function to persist the model into the database. A good example of this would be the Ruby on Rails framework. There is even a shell that allow you to do such manipulation of data with relative ease.\nUnfortunately, typing a database to the model object has several consequences; one of which is that it makes it hard to kind of test the model without using database mocks or even having a test database in place. This makes it hard to test the logic in the model.\nRather than needing end to end testing, we will just focus on the part that would really need such tests; which would be the logic. One can assume that the saving of data into the databases or other persistent system as of now.\nWe would do a few things when structuring the application:\nPut all of our logic (or as much as possible) into our models; in this case, it would be structs. Have a service layer that would deal with interfacing our model and logics with a persistance system (it could be something that could be stored in memory, a file system or even db). The service layer does nothing except to take the model/entity that it is suppose to \u0026lsquo;service\u0026rsquo; and save/store it. Our controller would consume the concrete service (defined through structs that have functions attached to it) and it would only be concerned about transport only. TDD immediately available # With such a structure, we can approach the app in TDD style. It\u0026rsquo;s slightly unfortunate that we don\u0026rsquo;t really have any complex algorithms to apply to really test the approach but this structure does provide an interesting structure to work with.\nFrom the blog post on the requirements of the bookcase application, we already have some sort of specs. We can convert the spec (constraints) on the model and get the tests out of the way. This video covers on how we can imagine this scenario. TDD for those who don\u0026rsquo;t need it\nAs a matter of convenience, we would you this pretty decent approach of create a list of tables of test cases that is to be tested with each model\u0026rsquo;s implementation. Refer to this youtube video for inspiration on this. Advanced Testing with Go\nAdditional challenges # (Not yet implemented in the code base - it will be slowly substituted in)\nFor an additional challenge when making this applicaiton, we would try to see if we can add the following:\nUtilize multiple data storage options. For some of the endpoints, we can see if we can make multiple implementations of the same service and make it easier to switch (or refactor) for 3rd party storage components. Redis MySQL Google Datastore Create an endpoint which is customized for a view (a subset of a domain model or a subet of a joined data model). Test the implementation of such an endpoint. Learning 2: Applying the decorator pattern # For every application that is to be deployed, we have to do a few final steps before making the app fully production ready. All of such concerns affect all application (some would call it cross cutting concerns). Some of the things that is to be added would:\nLogging (Different granularity of logging) Application Metrics (Sending functions call count etc to a prometheus server) Tracing (In microservices -\u0026gt; Opentracing) Some of the properties such as Circuit Breaking/Retry logic; it\u0026rsquo;s vital to have them if we are in the microservices. Reason is because if we were to deploy such services in Kubernetes, we can rely on the istio or other service meshes which can deal it on the cluster level; we would be impacted by some latency but unless it\u0026rsquo;s absolutely necessary to respond at blazing speeds, it is kind of resolved issue.\nLearning 3: All fields in structs that are to be stored in DB need to be public # Although it would ideal that certain fields such as password are set to private to prevent the property from being overwritten in a haphazard manner -\u0026gt; as well as to reduce the amount of exported fields, the fields need to public to allow functions and other packages to make use of them as well.\nThis kind of affects any language which has public/private fields in their class/struct definitions; even java. A random fact while going through is this: In Java, there is a Hibernate orm library; the framework is able to access private libraries. It accesses them via reflection -\u0026gt; I would assume you would need to do the same thing here as well in golang world if you want to do same. However, this is adding unnecessary complexity to an application, making it slower as well as more fragile.\nAlternatively, we can rearrange the whole application to fit the need by changing the folder structure and package structure:\nList of folders in current version:\nModels (Handles the domain structs as well as functions that does validation, checks and other calculations) Services (Handles the integration of domain structs to 3rd party components such as APIs, libraries as well as DB) Controllers (Calls the service by providing a concrete service method) Possible alternative way:\nDomains (e.g. User) File that contain struct declarations File that implements the intergration between the DB and the structs Test files There are a few disadvantages between using the current approach vs the alternative approach; one of which is the restriction of utilizing external libraries and packages to help build our software. There is no guarantee that external packages are able to read private fields like hibernate does.\nLearning 4: Inspiration from usage of GORM library # The GORM library has a pretty nice way of handling relationships between model structs. An initial version of the design was quite restrictive:\nLet\u0026rsquo;s take an example of an item in the store:\ntype Item struct { ID string Name string Description string CategoryID string SubCategoryID string CreatedAt time.Time UpdatedAt time.Time } With the CategoryID and SubCategoryID there, it is kind of a way of how the data is related to those said models. However, this would mean that we can\u0026rsquo;t manipulate the item\u0026rsquo;s subcategory data of an item via the item struct. We would have to call it from the database to retrive the category struct which is linked to this item struct which we can then modify. There are too many steps involved.\nIf one uses the GORM library and observe the way of how relationships is represented in the struct:\ntype Item struct { ID string Name string Description string Category Category CategoryID string SubCategory SubCategory SubCategoryID string CreatedAt time.Time UpdatedAt time.Time } We now see the Category and SubCategory struct also being part of the item struct. This would allow us to modify the item as one single entity which makes it more natural to manipulate etc.\n// We can query for parts of the Item struct this way: // retrivedItem is Item data retrieved from database fmt.Println(retrivedItem.Category.Name) Notice that the CategoryID and Subcategory would still need to be defined. Without them, those foreign keys won\u0026rsquo;t be stored in the database. Read more in the GORM documentation for more details.\n","date":"8 March 2018","externalUrl":null,"permalink":"/a-sample-bookcase-application-case-via-gin-golang-framework/","section":"Posts","summary":"This is an application based on a previous blog post on Bookcase application.\nThe link to the code base of the application:\nhttps://github.com/hairizuanbinnoorazman/golang-web-gin-book-store\nThere is a chance that when you are on the code base, the application is not fully operational; I am still adding code to it to add functionality to the application\n","title":"A sample bookcase application case via Gin Golang Framework","type":"posts"},{"content":"We would try to implement the various technology stack for some common web application scenario in several types of libraries. In our case here, we would attempt to implement it for the following scenario.\nIntroduction # The web application contains the following features:\nAn E-Commmerce web application (On a high level overview) For backend portions done in languages such as Python or Golang or Java, only the API Backend will be built. A stack that has both frontend and backend in one will not be used here. Email integration for certain user interactions on the website API Endpoints have permission checks. Probably can use the decorator method to set which API is to be protected by which -\u0026gt; Error 403 for those that do not connect accordingly. User Registration Flow\nUser sign ups with email and password. Email sent to user User can go into portal (But does not permission to do plenty of stuff?) - user is still inactive User activate emails by clicking on link in the email User is now active User forgotten password\nUser clicks forget password Email sent to user (Forgot password expiry activated with its respective token) After clicking on email for forgot password email, if before the forgotten expiry time, allow the password change, else; no change to the password Initial thoughts on construction of the API layer:\nProducts (List, Get, Add, Subtract, Modify - qty,description,status,etc) User (List,Get,Add,Modify) - Should not have delete operation Orders (List, Get, Add, Modify) Subscriptions/Wishlist (List, Get, Add, Modify) Admin Layer (Roles) - Administration of store information\nAuditor -\u0026gt; Can view all information - no edit Company Management -\u0026gt; Can view all information - no edit Store Manager -\u0026gt; Can view all information relating to his store - no edit Store Admin -\u0026gt; Can view/edit/append some things that are part of his store Store Supplier -\u0026gt; Can view/append some things that are part of his supplier Store Worker -\u0026gt; Can view/edit/append some things that are part of that store (under approval for edits) - limited view Promotion System\nAble to have flexible set of promotions in the store; e.g. 3 items for $2 20% storewide 15% if user bought a product from a certain subcategory from the store on a certain date List of database fields # User # Fields\nID First Name Last Name Email Password Facebook ID (Optional) Twitter ID (Optional) Google ID (Optional) Permissions isActive status activationCode forgotPasswordExpiry forgotPasswordToken Created Time Updated Time Last Login Time Some constraints set on the user struct:\nID: UUID. It has to be a UUID regex pattern during validation First name: Must not be empty, must be less than 100 characters Last name: Must not be empty, must be less than 100 characters Email: Must follow the email regex (Includes @ and domain at the end) Password: Password length \u0026gt; 8; Must contain at least small characters, Capital letters and a number Role # Fields\nID Name of role Description Status (Is it active?) Remarks Created Time Updated Time Some constraints set on the role struct:\nID: UUID. It has to be a UUID regex pattern during validation Name of role: Must not be empty, must contain any one of the following strings: [\u0026lsquo;admin\u0026rsquo;, \u0026lsquo;member\u0026rsquo;, \u0026rsquo;editor\u0026rsquo;, \u0026lsquo;view\u0026rsquo;] Description: Must not be empty, Text field Status: Only the following strings are allowed in: [\u0026lsquo;active\u0026rsquo;, \u0026lsquo;inactive\u0026rsquo;, \u0026lsquo;depreciated\u0026rsquo;] Role x Permission Mapping # Fields (Many:Many relationship)\nRole ID Permission ID Items # Fields\nID Name of Product Short Description Long Description Product Category (Foreign Key) Product Subcategory (Foreign Key) Status Remarks Created Time Updated Time Some constraints\nID: UUID. It has to be a UUID regex pattern during validation Name: Cannot be empty, Shorter than 150 characters Short Description: Cannot be empty. Shorter than 150 characters Long Description: Cannot be empty. Text field Product Category: It has to be part of a valid product category Status: Only the following strings are allowed in: [\u0026lsquo;active\u0026rsquo;, \u0026lsquo;inactive\u0026rsquo;, \u0026lsquo;depreciated\u0026rsquo;] Product Category # Fields\nProductCategoryID Name of Product Category Description Status Remarks Created Time Updated Time Some constraints\nID: UUID. It has to be a UUID regex pattern during validation Name: Cannot be empty, Shorter than 150 characters Short Description: Cannot be empty. Shorter than 150 characters Status: Only the following strings are allowed in: [\u0026lsquo;active\u0026rsquo;, \u0026lsquo;inactive\u0026rsquo;, \u0026lsquo;depreciated\u0026rsquo;] Product Subcategory # Fields\nProduct Subcategory ID Name of Product Subcategory Description Status Remarks Product ID Created Time Updated Time Some constraints\nID: UUID. It has to be a UUID regex pattern during validation Name: Cannot be empty, Shorter than 150 characters Short Description: Cannot be empty. Shorter than 150 characters Status: Only the following strings are allowed in: [\u0026lsquo;active\u0026rsquo;, \u0026lsquo;inactive\u0026rsquo;, \u0026lsquo;depreciated\u0026rsquo;] Supplier ID # ID Supplier Name Description Supplier Main Contact Supplier Secondary Contact Supplier Email Address Created Time Updated Time Some constraints\nID: UUID. It has to be a UUID regex pattern during validation Status: Only the following strings are allowed in: [\u0026lsquo;active\u0026rsquo;, \u0026lsquo;inactive\u0026rsquo;, \u0026lsquo;depreciated\u0026rsquo;] ","date":"7 March 2018","externalUrl":null,"permalink":"/a-sample-bookcase-application-case/","section":"Posts","summary":"We would try to implement the various technology stack for some common web application scenario in several types of libraries. In our case here, we would attempt to implement it for the following scenario.\n","title":"A sample bookcase application case","type":"posts"},{"content":"This is a personal list of Golang Resources I like to keep track. This is a evergreen list so I will update it once in a while when new stuff pops up.\nLibraries # Libraries that help in handling web applications/authorization/authentication Gin Gorilla Gokit jwt-go buffalo tablewriter - write tables in markdown terminal dashboard - dashboard on bash terminal dashboard for processes - dashboard on bash but for processes PromptUI - CLI Prompts Cobra - CLI Toolkit Tools # dep Current latest dependency tool that is kind of the official experiment for golang dependency management. godoc Part of golang tooling support. When you add documentation above function names - they will pop up within documentation which other developers can possibly consume. Visual Studio Code Golang extension It requires installation of a whole of Golang tool which includes Golang vet etc. Checks to ensure certain heuristics are kept to ensure readable code that is easy to extend. Blogs # Russ Cox\u0026rsquo;s Blogs\nhttps://research.swtch.com/ Dave Cheney\u0026rsquo;s Blog\nhttps://dave.cheney.net Mat Ryer\u0026rsquo;s Medium Posts\nhttps://medium.com/@matryer Video Tutorials # Just for func youtube channel:\nhttps://www.youtube.com/channel/UC_BzFbxG2za3bp5NRRRXJSw Golang UK Conference Videos:\nhttps://www.youtube.com/channel/UC9ZNrGdT2aAdrNbX78lbNlQ/videos Actor models Golang:\nhttps://www.youtube.com/watch?v=LHe1Cb_Ud_M\nhttps://www.youtube.com/watch?v=yCbon_9yGVs ","date":"28 February 2018","externalUrl":null,"permalink":"/favourite-golang-resources/","section":"Posts","summary":"This is a personal list of Golang Resources I like to keep track. This is a evergreen list so I will update it once in a while when new stuff pops up.\n","title":"Favourite Golang Resources","type":"posts"},{"content":"An excellent resource to read on Refactoring Golang code safely and to ensure that Golang code continue does not result in breaking changes in the codebase.\nhttps://talks.golang.org/2016/refactor.article\nAn important to take away from the article is the fact that when making API changes to a code base, the portion that results in largest amount of work is the amount of code repair that needs to be done. Here are some of the examples to take note:\nThis types of code repair refactors would likely happen in the Golang standard libraries.\nMoving constants across packages - part of code repair const OldAPIConstant = NewPackage.APIConstant Moving functions across packages - part of code repair // Use OldAPI\u0026#39;s signature func OldAPI(){ NewAPI() } Moving vars across packages - part of code repair var OldAPIVariable = NewPackage.APIVariable Moving types across packages - part of code repair type OldAPIType = Packagetype.NewAPIType Here are some additional refactoring notes:\nDoing the following adds plenty of extra code to your codebase. If you are sure no one is using the code base - e.g. It\u0026rsquo;s a private repository and nobody else is actually using the same codebase, then, it might be fine to just add/remove or doing other code edits which might usually cause application breakages.\nFor some that would require breaking changes etc, one extra step that you can do is to actually add a note about depreciation of some functionality and add some information on why the function or variable is depreciated\nAdd a new field to a struct safely (Don\u0026rsquo;t depreciate it yet - adding of new fields might result in unexpected behaviours?) type Planet struct { Name string `json:\u0026#34;name\u0026#34;` Radius float64 `json:\u0026#34;radius\u0026#34;` } type PlanetWithMass struct { Planet Mass float64 `json:\u0026#34;mass\u0026#34;` } Add a new parameter to a function - note (This is for temporary, once it is ok to do a major release, can clean out past versions) // Test1 is the old function - move code to new function // Test1WithOwner is the new function func Test1(name string){ // fmt.Println(name) - past code - move it to new function or a common function that has been extracted sufficiently. Test2(name, \u0026#34;\u0026#34;) } func Test1WithOwner(name, owner string){ fmt.Println(name) if owner != \u0026#34;\u0026#34; { fmt.Println(owner) } } ","date":"21 February 2018","externalUrl":null,"permalink":"/refactoring-go-safely/","section":"Posts","summary":"An excellent resource to read on Refactoring Golang code safely and to ensure that Golang code continue does not result in breaking changes in the codebase.\nhttps://talks.golang.org/2016/refactor.article\nAn important to take away from the article is the fact that when making API changes to a code base, the portion that results in largest amount of work is the amount of code repair that needs to be done. Here are some of the examples to take note:\n","title":"Refactoring Go Safely","type":"posts"},{"content":"A list of conferences and meetups and exhibitions to look for especially in 2018:\nThis kind of personal list that I\u0026rsquo;m keeping track; it mainly revolves around Golang, modern architecture technologies e.g. Cloud technologies etc, Python and even R (One of my initial language, I still do keep a lookout of how it\u0026rsquo;s doing nowadays.)\nConferences # RStudio Conference (January 2018) Conference Details: https://www.rstudio.com/conference/ Materials: https://github.com/rstudio/rstudio-conf/tree/master/2018 Video Link: https://www.rstudio.com/resources/webinars/#rstudioconf2018 Fossasia Conference (March 2018) Website: https://2018.fossasia.org/ Video Link: https://engineers.sg/conference/fossasia-2018 Vue Conference (March 2018) Conference Site: http://us.vuejs.org/ Conference Link: https://www.youtube.com/watch?v=AiF3XOu02-0\u0026list=PLJNLwTPak6dj-HOz4eFrKDJsJSZsgvWPs\u0026index=1 Gophercon Singapore 2018 (May 2018) Conference Link: https://2018.gophercon.sg/ Video Links: https://engineers.sg/conference/gopherconsg-2018 Kubecon + CloudNativecon 2018 Europe Conference Link: https://events.linuxfoundation.org/events/kubecon-cloudnativecon-europe-2018/ Conference Videos: https://www.youtube.com/watch?v=OUYTNywPk-s\u0026list=PLj6h78yzYM2N8GdbjmhVU65KYm_68qBmo Pycon US 2018 (May 2018) Conference Link: https://us.pycon.org/2018/ Video Links: https://www.youtube.com/channel/UCsX05-2sVSH7Nx3zuk3NYuQ Pycon APAC 2018 (May 2018) Conference Link: https://pycon.sg/ Video Links: https://engineers.sg/conference/pycon-apac-2018 Dockercon 2018 (June 2018) Conference Link: https://2018.dockercon.com/ Video Links: https://www.youtube.com/watch?v=RnWXOAplvjY\u0026list=PLkA60AVN3hh96Ef6GljWoGpXPG23rN7tQ Scipy 2018 Video Links: https://www.youtube.com/watch?v=y7zGnKzaKIw\u0026list=PLYx7XA2nY5Gd-tNhm79CNMe_qvi35PgUR UseR Conference (July 2018) Conference Link: https://user2018.r-project.org/ Google Cloud Next Conference (July 2018) Conference Link: https://cloud.withgoogle.com/next18/ Conference Sessions: https://cloud.withgoogle.com/next18/sf/sessions Gophercon UK (August 2018) Conference Link: https://www.golanguk.com/ Video Link: https://www.youtube.com/watch?v=2mgKDqD5Ga8\u0026list=PLDWZ5uzn69ewsMyuGjVsAnpQIjyud1Cv9 Gophercon Denver - main (August 2018) Conference Link: http://gophercon.com Google Cloud Summit Singapore (September 2018) Event Page: https://cloudplatformonline.com/2018-Summit-Singapore-Home.html Jenkins World 2018 Event Page: https://www.cloudbees.com/devops-world/san-francisco AWS Re:invent Conference (November 2018) Conference Link: https://reinvent.awsevents.com/ Kubecon | Nativecon 2018 North America (December 2018) Event Page: https://events.linuxfoundation.org/events/kubecon-cloudnativecon-north-america-2018/ Some Resources to follow to learn when the next conference will come:\nhttps://frontendfront.com/conferences/\nhttps://github.com/golang/go/wiki/Conferences\nhttps://www.python.org/events/\nhttps://events.linuxfoundation.org/\n","date":"14 February 2018","externalUrl":null,"permalink":"/things-to-watch-out-for-in-2018/","section":"Posts","summary":"A list of conferences and meetups and exhibitions to look for especially in 2018:\nThis kind of personal list that I’m keeping track; it mainly revolves around Golang, modern architecture technologies e.g. Cloud technologies etc, Python and even R (One of my initial language, I still do keep a lookout of how it’s doing nowadays.)\n","title":"Things to watch out for in 2018","type":"posts"},{"content":"After a long while being on some managed platform for writing blog posts, I decided to move out of that into one which would require myself to manage things on my own.\nA few reasons kind of came up which motivated such a decision:\nBlog posts being Code Centric # Blog post being too code-centric and all those managed platforms somehow managed to irritate me when it comes to creating them.\nI\u0026rsquo;ve tried a couple, e.g. Blogger (I\u0026rsquo;m guessing it\u0026rsquo;s a Google product but somehow, I\u0026rsquo;m not feeling that polish there), Wordpress and Medium. Wordpress was nice but it gets pretty complicated when I wish to add more code centric material; it always requires myself to actually go in and adjust it on my own.\nSeeing this, it kind of makes sense to just use markdown for my blogging. I don\u0026rsquo;t need any of those fancy features, styling and they may actually get in the way. As an example, here is an example excerpt of some code snippets that we can actually write here.\nIt\u0026rsquo;s just markdown, so if you write documentation for code projects or update code snippets in Github issues, this is definitely down your alley\u0026hellip;\nA Golang snippet\npackage main import \u0026#34;fmt\u0026#34; func main() { fmt.Println(\u0026#34;Hello World!\u0026#34;) } Curiosity on this static file ecosystem # There has been somewhat a quiet rise in the number of static website generator applications nowadays. Long time ago, Jekyll was a definite winner, its a solution that is always mentioned on Github whenever it came to the time to build a site to support the code you\u0026rsquo;re writing on Github.\nHowever, nowadays, there is a growing number of tools available. One for which is Hugo which is the one being used to write this blog right here.\nHere are some of the others that I\u0026rsquo;m kind of looking into and their links:\nHugo\nhttps://gohugo.io/ Jekyll (For all time\u0026rsquo;s sake I guess?)\nhttps://jekyllrb.com/ Gatsby JS (Was laughing at that name. Kind of reminded me of a hair product brand which has a bunch of funny ads)\nhttps://www.gatsbyjs.org/ But you know, if none of them tickle your fancy, you can always refer to this list right here:\nhttps://www.staticgen.com/\nAfter working with one or two CRM solutions so far, I\u0026rsquo;ve found it kind of troubling that all that data in the post is all being stored in some database. If it just happens that the database becomes corrupt, there is no way to retrieve back the data ever but then again, these corruption of data can happen to any system so maybe my logic don\u0026rsquo;t really make sense here. However, I do say one thing is that I prefer that all the posts being file based - it makes it easier to identify and modify them.\nFlexibility to customize # So far, the platforms I\u0026rsquo;ve tried so far doesn\u0026rsquo;t allow much customizations in terms of customizing look and feel of the site (not that I myself needed much customizations in terms of look and feel). However, I was a lot more concerned regarding the capability to add metadata and custom analytics tracking to the site.\nI needed the site to be used for experimentation and using platforms don\u0026rsquo;t really allow that level of experimentation.\nIn Hugo, it is relatively easy to create a code snippet which can be embedded in the every section of the site - It becomes possible to switch between Google Analytics, Google Tag Manager or other analytics tags.\nAdditional Remarks # Note to my future self;\nLook at this link for a guide on how to style the blog posts\u0026hellip;\nhttps://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet\nAlso, when attempting to use code snippets, you would need to specify the language being used. Not specifying would leave the code block as a yellow blob (somehting similar to code snippets in golang documentation site)\n","date":"31 January 2018","externalUrl":null,"permalink":"/opinion-piece-moving-to-hugo/","section":"Posts","summary":"After a long while being on some managed platform for writing blog posts, I decided to move out of that into one which would require myself to manage things on my own.\n","title":"Opinion piece - Moving to Hugo","type":"posts"},{"content":"","date":"31 January 2018","externalUrl":null,"permalink":"/categories/static-sites/","section":"Article Categories","summary":"","title":"Static-Sites","type":"categories"},{"content":"","date":"31 January 2018","externalUrl":null,"permalink":"/tags/static-sites/","section":"Technology Tags","summary":"","title":"Static-Sites","type":"tags"},{"content":"Disclaimer: There are definitely better ways of doing this; this is more of a lazy man\u0026rsquo;s way of doing it. This is just to explore the possibility of getting a golang application into AWS Lambda and successfully running it.\nAlthough Golang support is coming to AWS Lambda, (I\u0026rsquo;m totally excited for this - hopefully it will come out this year!) we can still try a few things on our own end to somehow get lambda to run our Go Applications. One common way would be to just write Golang applications as usual but instead of just running the command:\ngo build main.go We would instead use a nice feature that Golang have; cross compilation and build it for the AWS Lambda runtime.\nenv GOOS=linux GOARCH=amd64 go build main.go There would generate an executable that can run on the machine that built it but it should on linux machines just fine.\nIn python, we would just wrap that executable and call it accordingly. To put it simply; since AWS didn\u0026rsquo;t really support Golang runtime then, we could create the binary that was compatible with the AWS OS that is used to host AWS Lambda and then have the python code in the available python runtime run it. Essentially, any language that produces a binary that can be executed on a machine can be put here as long as the right values are being put in.\nfrom __future__ import print_function import json import subprocess def lambda_handler(event, context): value = subprocess.check_output(\u0026#34;./main\u0026#34;, shell=True) print(value) response = { \u0026#34;statusCode\u0026#34;: 200, \u0026#34;headers\u0026#34;: { \u0026#34;Content-Type\u0026#34;: \u0026#34;*/*\u0026#34; } } return response We would try to run this very simple go code example (The helloworld example)\npackage main import \u0026#34;fmt\u0026#34; function main() { fmt.Println(\u0026#34;Hello World\u0026#34;) } The deploy.sh file below:\n# Create the distribution folder rm -rf dist rm -f dist.zip mkdir dist # Build the go application env GOOS=linux GOARCH=amd64 go build main.go # Copy lambda files in cp lambda_function.py ./dist/lambda_function.py cp main ./dist/main # Generate the distribution zip cd dist zip -r dist.zip . cd .. cp dist/dist.zip dist.zip To view the full code for this example: https://github.com/hairizuanbinnoorazman/demonstration/tree/master/trying_aws_lambda/raw/go_example\nI can only think of very few reasons to want to do this; due to the nature of AWS Lambda where code might take a while to start running (e.g. cold start problem), there is no point having extremely efficient code. Unless you are doing extremely heavy compute stuff and python or any other language supported by AWS Lambda that can help resolve the issue, would this go kind of help a little. Other than that, it might be better to just play along with what AWS Lambda provides us.\nIf you would prefer an explanation and an example of this, you might want to watch this clip.\nHowever, with the upcoming support provided by AWS, it is probably uncessary to do all these weird hackaround. We can upload the Go Code straight into AWS Lambda and then have it execute accordingly.\nEven with all that simplicity, it would ideal to actually rely on serverless framework to actually help do the administrative portions of the serverless functions. There are many things to consider when writing serverless functions, one of which is to set up the required security roles. If the functions are dependent on queues etc, it would requiring work to administrate all that additional functionality.\nNaturally, as programmers, we would have all those work automated away, which could mean writing ansible scripts or even cloud formation scripts (specific AWS). However, one can just rely on the serverless framework to do all that heavy work.\n","date":"15 January 2018","externalUrl":null,"permalink":"/using-go-in-aws-lambda/","section":"Posts","summary":"Disclaimer: There are definitely better ways of doing this; this is more of a lazy man’s way of doing it. This is just to explore the possibility of getting a golang application into AWS Lambda and successfully running it.\n","title":"Using Go in AWS Lambda","type":"posts"},{"content":"Full Playlist can be found here: https://www.youtube.com/watch?v=Z3aBWkNXnhw\u0026list=PLj6h78yzYM2P-3-xqvmWaZbbI1sW-ulZb\nCloud Native Landscape: https://github.com/cncf/landscape\nKeynote: Can 100 Million Developers Use Kubernetes? Kubernetes: This Job is Too Hard: Building New Tools, Patterns and Paradigms to Democratize Weaving the Service Mesh Microservices, Service Mesh, and CI/CD Pipelines: Making It All Work Together Developing Locally with Kubernetes State of Serverless Keynote: What\u0026rsquo;s Next? Getting Excited about Kubernetes in 2018 Keynote: Manage the App on Kubernetes Video References\nKeynote: Can 100 Million Developers Use Kubernetes? # Video Link: https://www.youtube.com/watch?v=21l8v6eObcc Summary: Talk is more philosophical rather than technical; thinking about who are going to be the new users of the Kubernetes platform etc (youths etc) - how to get such people understand and learn cloud primitives Mention about Open Faas: https://github.com/openfaas/faas Mention about Minio: https://www.minio.io/ Kubernetes: This Job is Too Hard: Building New Tools, Patterns and Paradigms to Democratize # Video Link: https://www.youtube.com/watch?v=gCQfFXSHSxw Summary: Raise a view where it is really quite difficult to Mention about metaparticle project: https://metaparticle.io/ Weaving the Service Mesh # Video Link: https://www.youtube.com/watch?v=WFEllbmRI8U Summary: Overview of the Istio Service Mesh Project Nice Service Mesh Features Observability Resilency Traffic Control Security Policy Enforcement Zero Code Change Microservices, Service Mesh, and CI/CD Pipelines: Making It All Work Together # Video Link: https://www.youtube.com/watch?v=UbLG_qUyCgM Summary: Overview of how to have a such a pipeline to develop and deploy applications quickly and the tools that can be used to do so. https://open.microsoft.com/2017/10/23/announcing-brigade-event-driven-scripting-kubernetes/ https://brigade.sh/ https://draft.sh/ Kashti - A dashboard project built on top of Brigade: https://github.com/Azure/kashti Developing Locally with Kubernetes # Video Link: https://www.youtube.com/watch?v=_W6O_pfA00s Summary: How to introduce Kubernetes to developers seeing that Kubernetes is more of a ops tool rather than a developer tool http://gist-reveal.it/bit.ly/k8s-workshops#/a-modular-workshop-series-for-learning-kubernetes https://github.com/datawire/telepresence State of Serverless # https://www.youtube.com/watch?v=SNJipRS8qxw List of Serverless Platforms out there\u0026hellip; Apache Openwhisk: https://openwhisk.apache.org/ AWS Lambda: https://aws.amazon.com/lambda/ Google Cloud Functions: https://cloud.google.com/functions/ Kubeless: http://kubeless.io/ Fission: http://fission.io/ Nuclio: https://nuclio.io/ Iron Functions http://open.iron.io/ Openfaas https://www.openfaas.com/ Keynote: What\u0026rsquo;s Next? Getting Excited about Kubernetes in 2018 # Video Link: https://www.youtube.com/watch?v=lUnD9SJDgo8 Build Faster, Smarter, Better Inspiration from the Ruby on Rails community - Make it very easier to just add one more thing to an application to make the application more useful quicker. The Year of the Service Mesh. Making the microservices easier to get it up. Handle the harder parts of distribution applications Istio Envoy Conduit Make Data Workloads Easier Making it easier to deploy Machine Learning Applications on the Kubernetes platform GPU support? Integrating Serverless Natively? Apache Openwhisk Fission Kubeless Some common patterns Event driven Idling Simple build Fast start up Defining apps via configurations tools (Improve app and kube configurations) Helm kubecfg ksonnet kompose kedge app-def-wg (App Definitions Working Group) https://github.com/kubernetes/community/tree/master/wg-app-def Change how we operate Extensible and security identities Istio Kerberos Spiffe Container identity working group Policy, Multi-tenancy, Integration LDAP Open Policy Agent Better container runtimes/VM Interesting Projects\u0026hellip; https://github.com/appscode/kubed https://github.com/heptio/ark https://github.com/cloudnativelabs/kube-router https://github.com/GoogleCloudPlatform/kube-metacontroller Keynote: Manage the App on Kubernetes # Video Link: https://www.youtube.com/watch?v=ul624nYC8pw Questions to answer: What app types are there? Versions? What app instances are deployed? How many? Where? What is the app instance health? How much does it cost? Who are the app owners? Who gets paged? What CI pipelines associate with each app? Some of the resources in an attempt to have this (Spreadsheet, App Wiki, Tribal Knowledge) Owners Dashboards Metrics/SLAs Docs https://coreos.com/open-cloud-services/ Create a shared toolkit App Catalog App Types App Versions App Instances https://github.com/kubernetes/community/tree/master/wg-app-def ","date":"8 January 2018","externalUrl":null,"permalink":"/interesting-points-from-kubecon/native-con-2017/","section":"Posts","summary":"Full Playlist can be found here: https://www.youtube.com/watch?v=Z3aBWkNXnhw\u0026list=PLj6h78yzYM2P-3-xqvmWaZbbI1sW-ulZb\nCloud Native Landscape: https://github.com/cncf/landscape\nKeynote: Can 100 Million Developers Use Kubernetes? Kubernetes: This Job is Too Hard: Building New Tools, Patterns and Paradigms to Democratize Weaving the Service Mesh Microservices, Service Mesh, and CI/CD Pipelines: Making It All Work Together Developing Locally with Kubernetes State of Serverless Keynote: What’s Next? Getting Excited about Kubernetes in 2018 Keynote: Manage the App on Kubernetes Video References\n","title":"Interesting Points from Kubecon/Native con 2017","type":"posts"},{"content":"Following from the previous blog post: Using AWS Lambda for Data Science Projects and Automations - Part 1\nLet\u0026rsquo;s deploy a serverless application!\nProblem Statement:\nThe application we would be trying out this time will do the following:\n\u0026ldquo;Read csv files when it is loaded into S3, load via the pandas package, sum the numeric sum and then send the result of that analysis into Slack.\u0026rdquo;\nIt sounds like quite a mouthful and sounds simple but with all the gotchas surrounding the AWS Lambda platform, we need to tread out steps carefully and try each step before proceeding onward.\nLet\u0026rsquo;s break the problem into smaller bits which we can then try out.\nLoad up the Requests package Load up the Pandas package Read event value when AWS S3 is triggered Prepare the Slack URL to receive the result of the \u0026lsquo;analysis\u0026rsquo; which is the summing of values of a column Load up the Requests Package # Gotcha: You cannot just install python packages on a AWS Lambda function. You will need to load up the installed libraries together with your codebase\nGotcha: If you use the API Gateway, ensure that output is right. In the case here, you would need a dictionary with the Content-Type and StatusCode.\nFor the latest codebase to handle this: https://github.com/hairizuanbinnoorazman/demonstration/tree/master/trying_aws_lambda/raw/requests_example\nA copy of the code to this is available here as well, in case the above link becomes unavailable:\nThis is the minimum codebase to get something started in AWS Lambda.\ndeploy.sh\n# Create the virtual environment rm -rf temp virtualenv temp source temp/bin/activate pip install -r requirements.txt # Create the distribution folder rm -rf dist mkdir dist # Copy lambda files in cp lambda_function.py ./dist/lambda_function.py cp -r ./temp/lib/python2.7/site-packages/\\* ./dist/ # Generate the distribution zip cd dist zip -r dist.zip . cd .. cp dist/dist.zip dist.zip # Deactivate virtual environment deactivate requirements.txt\nrequests lambda_function.py\nfrom **future** import print_function import json import requests def lambda_handler(event, context): response = requests.get(\u0026#34;https://www.google.com\u0026#34;) print(\u0026#34;Print the status code of this\u0026#34;) print(response.status_code) response = { \u0026#34;statusCode\u0026#34;: 200, \u0026#34;headers\u0026#34;: { \u0026#34;Content-Type\u0026#34;: \u0026#34;_/_\u0026#34; } } return response Load up the Pandas Package # Gotcha: The approach above to load the requests package cannot be used to load the pandas package. We need to build the c-bindings behind the pandas library which mean that we kind of need to know the machine that is used to run lambda.\nAfter running the deploy.sh script in the container, we would need to kind of run a \u0026lsquo;hackish\u0026rsquo; command.\nFor the latest codebase for this: https://github.com/hairizuanbinnoorazman/demonstration/tree/master/trying_aws_lambda/raw/pandas_example\nFYI: You will need to install docker to run this example.\nTo run and generate the dist.zip file that we need for the lambda function, we would need the docker_commands.sh shell script first. After which, we should land inside the shell of the docker container.\nWhen we are inside the docker shell, we would just need to run deploy.sh. This would generate the dist.zip but we would still need to export it out of the container. We have a command for it though, so just follow along.\n**deploy.sh is similar to the requests example above\nrequirements.txt\nrequests pandas lambda_function.py\nfrom **future** import print_function import json import requests import pandas as pd def lambda_handler(event, context): # Testing out pandas names = [\u0026#39;Bob\u0026#39;,\u0026#39;Jessica\u0026#39;,\u0026#39;Mary\u0026#39;,\u0026#39;John\u0026#39;,\u0026#39;Mel\u0026#39;] births = [968, 155, 77, 578, 973] BabyDataSet = list(zip(names,births)) print(BabyDataSet) df = pd.DataFrame(data = BabyDataSet, columns=[\u0026#39;Names\u0026#39;, \u0026#39;Births\u0026#39;]) print(\u0026#34;Printing out the dataframe\u0026#34;) print(df) response = { \u0026#34;statusCode\u0026#34;: 200, \u0026#34;headers\u0026#34;: { \u0026#34;Content-Type\u0026#34;: \u0026#34;_/_\u0026#34; } } return response Dockerfile\nFROM amazonlinux RUN yum install -y python27-pip zip RUN pip install virtualenv ADD . . docker_commands.sh\ndocker build -t lambdafunction . docker run -it --name awslambda lambdafunction /bin/bash # docker cp awslambda:/dist.zip . Additional commands not in any of the files:\ndocker cp awslambda:/dist.zip . This would copy the dist.zip file out from the container which we can then use to upload for our lambda function.\nYou will notice that the dist.zip is quite huge as compared to the previous requests example. We would need to do the alternative method which is upload the script into S3. After which, we can then proceed to feed it into AWS S3.\nIf we keep pretty much everything the same from the previous example, we can just switch out the script and it should still work as expected. (The whole of this is to test out how to get pandas into AWS Lambda after all)\nRead event value when AWS S3 is triggered # Gotcha: Ensure that the name of the csv file does not contain spaces or other special characters. The event values somehow alter the names of such files which results in issues when the AWS Lambda function is triggered.\nGotcha: Don\u0026rsquo;t mess up when creating the S3 trigger.\nGotcha: A major issue is setting the permissions right. If you don\u0026rsquo;t set the permissions right, the function will keep complaining that it doesn\u0026rsquo;t have the permissions needed to access the resources it needs to run e.g. S3 or Cloud Logs. One of the worst things that happened while I was experimenting this was that I accidentally disable cloud logging as well as S3 access for a lambda function. The lambda function is rendered useless and there isn\u0026rsquo;t even logs to even indicate that!! So yea, try not to fiddle with permissions to much, but rather get familiar with it and get it right.\nWe would changing and prepping up the example such that it would be closer to what we would expect of this. We would have a S3 trigger to ping us the csv files which would then read and run our \u0026lsquo;analysis\u0026rsquo;.\nYou can get the latest of the code here: https://github.com/hairizuanbinnoorazman/demonstration/tree/master/trying_aws_lambda/raw/s3_example\nThe data to test that functionality can be found here: https://github.com/hairizuanbinnoorazman/demonstration/tree/master/trying_aws_lambda/raw/s3_example_data\nMost of the files are roughly the same except for this file: You can copy most of them from previous sections.\nlambda_function.py\nfrom **future** import print_function import json import requests import pandas as pd import os # Comes with AWS Lambda import boto3 s3_client = boto3.client(\u0026#39;s3\u0026#39;) def lambda_handler(event, context): # Testing out triggers with AWS S3 for record in event[\u0026#39;Records\u0026#39;]: key = record[\u0026#39;s3\u0026#39;][\u0026#39;object\u0026#39;][\u0026#39;key\u0026#39;] bucket = record[\u0026#39;s3\u0026#39;][\u0026#39;bucket\u0026#39;][\u0026#39;name\u0026#39;] print(key) print(bucket) download_path = \u0026#34;/tmp/temp.csv\u0026#34; s3_client.download_file(bucket, key, download_path) print(os.path.isfile(\u0026#34;/tmp/lol.csv\u0026#34;)) print(os.path.isfile(download_path)) df2 = pd.read_csv(\u0026#34;/tmp/temp.csv\u0026#34;) print(df2) print(sum(df2[\u0026#39;col1\u0026#39;])) return \u0026#34;success\u0026#34; On the AWS Lambda creation page, instead of using the API Endpoints trigger, we should just use the S3 triggers. We would need to configure the S3 triggers to activate on any Object Created with a suffix of csv. This would allow the bucket to trigger every time a csv file is added to the bucket.\nAs mentioned above, there are some sort of issues when doing this, so rather than using an existing bucket, use a fresh new s3 bucket storage for testing this.\nPrepare the Slack URL to receive the result of the \u0026lsquo;analysis\u0026rsquo; which is the summing of values of a column # There is nothing different about this. We would just append a Slack Webhook at the end:\n# Append the following code at the bottom url = \u0026#34;https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX\u0026#34; payload = {\u0026#34;text\u0026#34;: \u0026#34;Total for col2 is %s\u0026#34; % (value)} r = requests.post(url, json=payload) Summary # If you take a look at all the steps above, it would seems like as though its quite troublesome to have to handle all that just to get a serverless functions up and running. If it was this troublesome just to get one up and running and is it really worth all that effort?\nThis is where tools and framework really help a lot. One of the tools/framework that we can think of using is the Serverless framework https://serverless.com/.\n","date":"2 January 2018","externalUrl":null,"permalink":"/using-aws-lambda-for-data-science-projects-and-automations-2/","section":"Posts","summary":"Following from the previous blog post: Using AWS Lambda for Data Science Projects and Automations - Part 1\nLet’s deploy a serverless application!\nProblem Statement:\nThe application we would be trying out this time will do the following:\n","title":"Using AWS Lambda for Data Science Projects and Automations 2","type":"posts"},{"content":" A thought experiment # Let\u0026rsquo;s say there was this one day during your usual work hours where you are tasked to handle some data transformations between your data sources. The data source is csv file generated from backend systems and is provided on the hourly basis. These data sources are to be analyzed as soon as possible and the insights are to be relayed to the marketing and business intelligence teams. How should we handle this? (Of course we should aim for as cheap a solution as possible)\nIf we are to do this task normally, we might think of spinning up a single AWS EC2 Compute Instance. To make comparisons fair, let\u0026rsquo;s say the memory requirements for this compute is 1GB of memory. If you were to check the cost of this, it would be:\nMaintaining 1GB EC2 Compute Instance on Demand as of 20 December 2017: $8.50\nHowever, let\u0026rsquo;s say we construct a varying solution instead; we rely on AWS Lambda. If we are to calculate a pretty bad estimate where each time the transformation runs, it takes 5 mins on 1GB of memory and assuming that we don\u0026rsquo;t use up the free 1 million requests that AWS grants to all of its users (as of 20 December 2017). The price of that would go as follows:\nAs of 20 December 2017, maintaining 1GB Memory for AWS Lambda, 5 minutes each time to transform data, no free requests available. The calculations can be condensed below:\nSize of AWS Lambda: 1GB Memory Length of Function: 5 minutes Number of times to run per day: 24 No of days in a month: 30 Cost of AWS Lambda for the memory specified: $0.000001667 per 100ms This numbers are just estimate to quickly compare its cost effectiveness. By default, initial usage of AWS Lambda is free for the amount of time used to run the function but we are not taking that into account for now.\nEstimated Cost: $3.60\nThe price is kind of comparable if you use the following scenario but let\u0026rsquo;s instead say that the workload only needs to run once a day. The cost of running such a compute drops to mere cents. This makes running the script on a EC2 instance for the whole month so much more expensive.\nHowever, the numbers above are just mere estimates, we would still need to run actual experiments to actually compare the cost effectiveness between the two solutions (Hosting a data science automation on EC2 vs AWS Lambda).\nComparing approaches # Instead of just looking at cost alone, let\u0026rsquo;s take a look at other things that should be considered when architecting a solution.\nEase of getting resources to understand deploying to EC2 or AWS Lambda # With a quick search on Google, you will find that is is slightly harder to find applications/solutions done as compared to the alternative. This is understandable; the method done on EC2 is pretty common; the approach would be install the dependencies and then rely on cron to handle the running of the scripts.However, it is slightly harder to find stuff for AWS Lambda; the approach is newer (actually its already a few years old) and not many people are immediately introduced to such a way of doing things. Programming books still rely on using servers or managed web platforms (Heroku?) to deploy the applications.\nOperating the solution on EC2 or AWS Lambda # This might be tough the conclude properly. It is easier to get some of the operation effort for AWS Lambda; it comes out of the box with AWS Cloud Logs and Metrics. At the same time, the solution can be relatively standalone and it makes it easy to scale out the solution if needed.For EC2, unfortunately, it is a lot harder to do so. Most of the tasks to make it operationally easier to handle requires the developer to do so. Some of the tasks like exposing your logs to a centralized logging system, hooking metrics to it, ensuring you have a immutable image when deploying a solution (i believe that this is really vital when it comes to deploying services to the cloud) etc. The tooling is there and available but it would be really hard for those who just started scripting to go and do such tasks.\nDevelopment ease/difficulty # AWS Lambda is a managed platform. And with all managed platforms, you cannot really install whatever you like into it. If you read the AWS Lambda documentation and plenty of stack overflow posts, you would have gotten a hint of how to do this: We install the libraries and package it up together with our code to the AWS Lambda. (It sounds easier than what it actually is)On the other hand, installing packages on your servers/containers is a piece of cake but this also meant that if there was any package management issue that causes the server/container to be unstable, that would be your problem.\nLet\u0026rsquo;s Code\u0026hellip; # \u0026hellip;but not in this post. This post is plenty long enough. The next post in this set of serverless blog posts that I will be covering would cover attempts to code and deploy code on AWS Lambda or other serverless platforms.\n","date":"20 December 2017","externalUrl":null,"permalink":"/using-aws-lambda-for-data-science-projects-and-automations-1/","section":"Posts","summary":"A thought experiment # Let’s say there was this one day during your usual work hours where you are tasked to handle some data transformations between your data sources. The data source is csv file generated from backend systems and is provided on the hourly basis. These data sources are to be analyzed as soon as possible and the insights are to be relayed to the marketing and business intelligence teams. How should we handle this? (Of course we should aim for as cheap a solution as possible)\n","title":"Using AWS Lambda for Data Science Projects and Automations 1","type":"posts"},{"content":"I\u0026rsquo;ve been learning plenty of Golang nowadays and one of the most common design patterns that I keep hearing about is the decorator pattern. It is often used when handling with web requests; where you would create a function that accepts a struct that implements the handler interface which would then return an struct that also implements the handler interface.\nI didn\u0026rsquo;t really think too much about it until I watch the following video on Go-kit:\nEssentially, this patterns allows one to reduce code bloat in a domain function which is usually caused by all the additional software activities; most of which may not be essential for the business logic, but is essential in terms of ensuring the running and correctness of the software/script.\nThe analogy mention during the video where he describes the software being like an onion where the decorator functions just keep adding on functionality to the function that implements the business logic. It makes the whole software extremely flexible and opens up a lot of new doors.\nBut enough of theory; let\u0026rsquo;s do something practical!\nLet\u0026rsquo;s say you have this function in a python script.\ndef very_important_function(param1, param2): print(\u0026#34;The first param is: %s\u0026#34; % param1) print(\u0026#34;The second param is: %s\u0026#34; % param2) result = param1 + param2 return(result) And you managed to get it out of the door into your production system and you\u0026rsquo;re proud of it.\nHowever, your manager comes along and say:\n\u0026ldquo;Hey! We need logging for this! We need to know what\u0026rsquo;s going on in the function!!\u0026rdquo;\nWith that, you adjust the code accordingly:\nimport logging def very_important_function(param1, param2): logging.info(\u0026#34;Parameter 1 is %s\u0026#34; % param1) logging.info(\u0026#34;Parameter 2 is %s\u0026#34; % param2) print(\u0026#34;The first param is: %s\u0026#34; % param1) print(\u0026#34;The second param is: %s\u0026#34; % param2) result = param1 + param2 logging.info(\u0026#34;very important function complete\u0026#34;) return(result) However, now, another developer comes along and says:\n\u0026ldquo;Hey!! We need to get timings for your function! Without it, it would difficult to get optimize our script - we wouldn\u0026rsquo;t know where to start!\u0026rdquo;\nOnce again, you will go along and adjust the code accordingly:\nimport logging def very_important_function(param1, param2): # Get the start time start_time = time.time() # Logging to check on what parameters is being pushed in logging.info(\u0026#34;Parameter 1 is %s\u0026#34; % param1) logging.info(\u0026#34;Parameter 2 is %s\u0026#34; % param2) print(\u0026#34;The first param is: %s\u0026#34; % param1) print(\u0026#34;The second param is: %s\u0026#34; % param2) result = param1 + param2 # Logging to indicate the function has completed running logging.info(\u0026#34;very important function complete\u0026#34;) # Get the end time end_time = time.time() # Logging the timing information out logging.info(\u0026#34;start time: %s\u0026#34; % start_time) logging.info(\u0026#34;end time: %s\u0026#34; % end_time) logging.info(\u0026#34;function duration: %s\u0026#34; % (end_time-start_time)) return(result) Hmm\u0026hellip; The code base is starting to become quite ugly. The amount of code bloat is a bit too much -\u0026gt; In this case, the amount of auxilliary code is larger than the business logic/domain logic code which is actually important to the business.\nSo seeing this, what can we do to reduce and improve our situation? This is where implementing the decorator pattern would help. (Each language would have its own way of implementing it, will be focusing on python for this)\nLet\u0026rsquo;s create the following decorators\nThis would be the timing logger decorator:\nimport time from functools import wraps def timing_logger(func): @wraps(func) def wrapper(*args, **kwargs): start_time = time.time() end_time = time.time() func(*args, **kwargs) # Logging the timing information out logging.info(\u0026#34;start time: %s\u0026#34; % start_time) logging.info(\u0026#34;end time: %s\u0026#34; % end_time) logging.info(\u0026#34;function duration: %s\u0026#34; % (end_time-start_time)) return wrapper The wraps function is necessary in order to expose the function defined in func to be exposed. Without wraps, it makes it slightly harder to use the functionality.\nWe still have the other logging function that needs to be handled in order to understand what kind of inputs is being fed into the very_important_function function.\nfrom functools import wraps def input_logger(func): \u0026#34;\u0026#34;\u0026#34; Only accepts named arguments \u0026#34;\u0026#34;\u0026#34; @wraps(func) def wrapper(**kwargs): for key in kwargs: logging.info(\u0026#34;Input %s contains %s\u0026#34; % (key, kwargs[key])) func(**kwargs) return wrapper With the following decorators in mind, we can remove all the auxilliary software logic out of the main function (very_important_function)\n@timing_logger @input_logger def very_important_function(param1, param2): result = param1 + param2 return(result) However, if you were to run, it would produce the following output:\nIn[1]: value = very_important_function(param1=2, param2=2) INFO:root:Input param2 contains 2 INFO:root:Input param1 contains 2 INFO:root:start time: 1509003378.33 INFO:root:end time: 1509003378.33 INFO:root:function duration: 1.19209289551e-06 Take note that we are using named parameters here. This would allow us to make use of the name of the parameter being passed into the function so that it can be logged out much more easily.\nHowever, the variable value doesn\u0026rsquo;t contain the expected output! What happened here?\nIf you were to look into each of the decorator, you realize, that the decorators are not returning the output of the func that is being passed in. If we are to alter the function definitions here\u0026hellip;\nimport time from functools import wraps def timing_logger(func): @wraps(func) def wrapper(*args, **kwargs): start_time = time.time() end_time = time.time() value = func(*args, **kwargs) # Logging the timing information out logging.info(\u0026#34;start time: %s\u0026#34; % start_time) logging.info(\u0026#34;end time: %s\u0026#34; % end_time) logging.info(\u0026#34;function duration: %s\u0026#34; % (end_time-start_time)) return value return wrapper from functools import wraps def input_logger(func): \u0026#34;\u0026#34;\u0026#34; Only accepts named arguments \u0026#34;\u0026#34;\u0026#34; @wraps(func) def wrapper(**kwargs): for key in kwargs: logging.info(\u0026#34;Input %s contains %s\u0026#34; % (key, kwargs[key])) value = func(**kwargs) return value return wrapper With the modified functions above, that would return the values from our important function and with that, we kind of got solved our issue of running our domain function without any of clutter from auxiliary software requirements. :D\nMaybe in the future, I might do any post about how we might go crazy with the decorators (e.g. Pinging Slack, Pinging Google Analytics, Logging, Authentication etc - That might be something fun to build)\n","date":"26 October 2017","externalUrl":null,"permalink":"/using-decorator-pattern-to-remove-code-bloat/","section":"Posts","summary":"I’ve been learning plenty of Golang nowadays and one of the most common design patterns that I keep hearing about is the decorator pattern. It is often used when handling with web requests; where you would create a function that accepts a struct that implements the handler interface which would then return an struct that also implements the handler interface.\n","title":"Using Decorator Pattern to Remove Code Bloat","type":"posts"},{"content":"A sample application to kind of get started with Go.\nThis application involves pinging a channel on Slack via a webhook. Slack provides a unique URL in order to ping Slack with messages from a script/application.\n/* Example of using Go to ping Slack This would ping a message by the text message passed via postMessage function In order to utilize this file, use the command: go run slack_example.go Else, generate a binary file by running the command: go build slack_example.go */ package main import ( \u0026#34;fmt\u0026#34; \u0026#34;encoding/json\u0026#34; \u0026#34;bytes\u0026#34; \u0026#34;io/ioutil\u0026#34; \u0026#34;net/http\u0026#34; ) type message struct { Text string `json:\u0026#34;text\u0026#34;` } func postMessage(msg string) { slackUrl := \u0026#34;https://hooks.slack.com/services/{KEYS}\u0026#34; // Create a reader to be used by http.Post response := message{Text: msg} body, _ := json.Marshal(response) byteBody := bytes.NewReader(body) res, err := http.Post(slackUrl, \u0026#34;application/json\u0026#34;, byteBody) if err != nil { fmt.Println(err.Error()) fmt.Println(\u0026#34;Try again later.\u0026#34;) } fmt.Println(\u0026#34;Status of response:\u0026#34;, res.Status) fmt.Println(\u0026#34;Status code of response:\u0026#34;, res.StatusCode) content, err := ioutil.ReadAll(res.Body) if err != nil { fmt.Println(err.Error()) } fmt.Println(string(content)) } func main() { fmt.Println(\u0026#34;A test application to fire a message into Slack\u0026#34;) postMessage(\u0026#34;init\u0026#34;) } Some of the concepts introduced here:\nIntroduction to a variety of libraries from std package Introduction to go tool chain, namely go build, go install Some improvements can be made to the above program by allowing it to made into a proper command line tool. This would include accepting of arguments, parsing them and them sending them of to the slack channel. A possible command line library that can be created from this could be the cobra. https://github.com/spf13/cobra ","date":"22 October 2017","externalUrl":null,"permalink":"/using-go-to-post-messages-on-slack/","section":"Posts","summary":"A sample application to kind of get started with Go.\nThis application involves pinging a channel on Slack via a webhook. Slack provides a unique URL in order to ping Slack with messages from a script/application.\n","title":"Using Go to post messages on Slack","type":"posts"},{"content":"This is a suggestion piece and not a recommended way of using docker or anything.\nMotivation # The question we would want to know here is how do we exactly run the full on/all the unit tests for our applications built via Docker. One way to do this is to rely on a build server like Jenkins to create the required environment which we would need for a build and then run the unit test needed. However, this would mean that there is need to bootstrap a environment to do so.\nIdea 1: Embed the testing scripts into the Dockerfile for building the application # We would not be able to use the previous Dockerfile which we would have used to build the system. If we were to try to run a test command via RUN in the Dockerfile, it might be cached in future builds which we definitely don\u0026rsquo;t want to happen.\nAnother reason not to just run the testing command in the Dockerfile used for building the application is that we might overload the docker with unnecessary files and that totally goes against one of the better practises of Docker which is to keep images small and minimal.\nWith that, this might be a bad way of running unit tests. I would say that this solution kind of works but it has that sense of inelegance? We would need to sacrifice the last set of layers as well as have bloated containers which would contain testing scripts and testing data in it.\nIdea 2: Create separate Dockerfiles to run tests and separate Dockerfiles for builds # We can have separate Dockerfiles in order to run tests and create the application builds. This would allow us to load up the required testing scripts and testing faux data to the testing container which can then be used to run unit tests.\nHowever, if you think about it, if the two Dockerfiles (one for tests and one for builds) are separated that way, we would need to ensure that the environments between the two are exactly the same or else the benefits of using Docker kind of goes away with it. This would mean that we now need to have a third Dockerfile that serves as a form of \u0026ldquo;base\u0026rdquo; which we can then use to build up the testing and application Dockerfiles.\nI wouldn\u0026rsquo;t recommend having this though. Having three Dockerfiles sounds like a pain to maintain.\nIdea 3: Making use of Multistage builds for Docker but not for its intended purposes # If you are to read the purpose of the multistage builds feature in docker, it is really not meant for testing code. It is meant for providing users the ability to create light weight containers without adding useless files that is meant only for use in production.\nThe main example usually provided are Go applications. Go applications compile down to a binary and only that is needed to be able to run on the containers. You can see the example of this and benefits of it in the article below.\nMultistage build for Docker https://docs.docker.com/engine/userguide/eng-image/multistage-build/\nOk, but back to the topic. How should we use it for our scenario?\nLet\u0026rsquo;s first have a 2 files. A testing python script and a Dockerfile. (I would imagine it would work with other languages as well?)\ncontent of test_sample.py\ndef inc(x): return x + 1 def test_answer(): assert inc(3) == 4 Save the file above as a test_sample.py Install pytest python library as well as add all files in current directory\nFROM python:3 as base RUN pip install pytest ADD . . This is the container build that will run the \u0026ldquo;unit tests\u0026rdquo;\nFROM base RUN pytest test_sample.py This is the container build statements that will create the container meant for deployment\nFROM base CMD python Save the file above as Dockerfile\nAs you can see from above, there is the concept of the three Dockerfiles as mentioned in the second idea but instead, all the statements are all in the same file. Another good thing is that it is possible to refer to intermediate builds (refer to the base which we use to build up the container to run tests and another to run for deployment)\nWe can run the Docker build command as follows:\ndocker build -t awesome_app . This works! Unfortunately, the problem that was mentioned in Idea 1 will need to be mentioned here again. The test script line will be cached but we don\u0026rsquo;t want it to be cached at all! It\u0026rsquo;s unit tests; it needs to run every build to ensure we are hitting the minimum application spec.\nWe can resolve this by adding the following line to the Dockerfile, a build argument. ARG cache=1. If we adding it to our testing snippet\nBuilding the base image and dependencies\nFROM python:3 as base RUN pip install pytest ADD . . Meant for running tests\nFROM base ARG cache=1 RUN pytest test_sample.py Meant for building the deployment container\nFROM base CMD python Instead of running the docker build command from above, we would need to alter it slightly so that it always bust the cache for the portion of the docker build process that does testing.\ndocker build -t awesome_app --build-arg cache=\\$(date +%Y-%m-%d:%H%M:%s) . This would ensure that cache is always busted accordingly.\nBut, if you read the docker docs, you can argue that if a cache is busted for one of the lines in the dockerfile, then the cache for the rest of the layers above it would also be busted which might mean a rebuild of the application.\nLucky for us, this does not happen. Apparently the ARG layer being busted only affected that container specified in that section of the multi stage build. To test this we can alter the Dockerfile above to the following:\nBuilding the base image and dependencies\nFROM python:3 as base RUN pip install pytest ADD . . Meant for running tests\nFROM base ARG cache=1 RUN pytest test_sample.py Meant for building the deployment container\nFROM base RUN pip install requests RUN pip install flask CMD python If we are to run this continuously multiple times via the following command:\ndocker build -t awesome_app --build-arg cache=\\$(date +%Y-%m-%d:%H%M:%s) . On the initial build, there would be a install of requests and flask. Subsequently, the section of the Dockerfile that install requests and flask would keep using the ones that are already cached. And the section that runs the tests would always be rerun no matter what as the build arguments would cause the cache to be busted for each and every docker build.\nAnyways, just a random thought here. I would assume that unit tests only need to run when new code is added so if I\u0026rsquo;m not wrong, the ADD should be busting the cache if that happens and all statements above that would be invalidated and need to be run. You might not need the approach mentioned here and with careful organization of the steps in the Dockerfile, it is possible to have a simpler Dockerfile.\nTLDR; Idea 3 seems to be best in terms of the following:\nKeeping to a single Dockerfile rather than complicating any software project further (and the ability to run unit tests as well)\n","date":"17 October 2017","externalUrl":null,"permalink":"/using-docker-multi-stage-builds-to-run-unit-tests/","section":"Posts","summary":"This is a suggestion piece and not a recommended way of using docker or anything.\nMotivation # The question we would want to know here is how do we exactly run the full on/all the unit tests for our applications built via Docker. One way to do this is to rely on a build server like Jenkins to create the required environment which we would need for a build and then run the unit test needed. However, this would mean that there is need to bootstrap a environment to do so.\n","title":"Using Docker Multi Stage builds to run unit tests","type":"posts"},{"content":"Over the weekend, I\u0026rsquo;ve been experimenting whether if its possible to set up screen recording on a linux server. This is partly just out of curiosity but also, a little a bit of frustration. Imagine if you were in a position where you aim to assist people in recording their training sessions over on Google Hangouts but in order to do so, you would need to be around and your computer needs to be \u0026ldquo;sacrificed\u0026rdquo; in order to do the recording.\nLuckily, it seems that with a little experimentation work, it\u0026rsquo;s possible to actually set up such a service on the side.\nLet\u0026rsquo;s start with what we would need to think about before we can get things started.\nVideo/Audio recording software. A lot of people can easily suggest to look into OBS but for a automated system and on a linux, we might be better off with a command line utility. After researching a little on this, ffmpeg toolkit is mentioned pretty often in many of the linux/ubuntu forums. How to simulate going into Google Hangouts? We would need a browser simulation tool. Should we go with the headless tool such as PhantomJS or even Chrome browser headless mode? How the user would interact with the tool? Using a scheduling service via Google Calendar? Or allow the user the call upon the service via Slack? Possibility of hosting it on Docker? For this docker side of things, it is more of a personal desire to have it hosted on docker rather than on a normal instance; partly to increase portability of the tool but also to because of my familiarity with the tool Installing ffmpeg/avconv # So to start things of, we would think about the command toolkit. While looking around the linux forums, it turns out ffmpeg is not really available for debian-flavor of the linux systems. Instead, we would use avconv which is a separate fork of the ffmpeg command line utility.\nThere is so far little difference between ffmpeg and avconv command line tools. Any command copied over on forums that use ffmpeg is still usable on the avconv so that seems to be no issue from that angle.\nInstalling a browser and simulating it # Previously I used PhantomJS to simulate the browser but with the big news that the maintainer for this stepping down, we would instead try Google Chrome headless mode instead.\nNews on Phantom JS maintainer stepping down https://groups.google.com/forum/#!topic/phantomjs/9aI5d-LDuNE\nThe Google Chrome installation is slightly mode tedious as its installation is not part of the default apt-get list. We would need to grab that list and add it to our own on the server to even make it possible to install it.\nThe bash script that I would provide later would contain that.\nPossibility of Dockerizing it # Well, we can definitely dockerize the video portion of this screen recording mini project but it is difficult to record the audio portion of the screen recording in a docker container. Reason for this is that the tools that is the needed to run this (pulseaudio) seem to require some dbus mechanism which is not really exposed to the docker container but for all you know, I\u0026rsquo;m missing some configuration within the command line to switch it to an alternative mode.\nIn order to research further on this, I\u0026rsquo;m looking through some of the Dockerfiles that Jessie Frazelle has put up and that link is available here:\nhttps://github.com/jessfraz/dockerfiles\nHowever, even though we can only dockerize only the video portions of software, it can probably be used in other software ideas: e.g. Using it when running unit tests which provides video on how the software is interacting from the frontend.\nPutting it together # So, to put it all together, this is what we have:\nInstallation of all the required components in a bash file. (You may need to install a text editor in order to add it to the server or you can choose to use git to do so) https://github.com/hairizuanbinnoorazman/video-recording-service/blob/master/install_gce.sh Instructions on how to run the service https://github.com/hairizuanbinnoorazman/video-recording-service/blob/master/USAGE.md The instructions above are still very complicated as details are not ironed out yet but more details would come out soon. You can look into the project plan (https://github.com/hairizuanbinnoorazman/video-recording-service#whats-involved) and see if there are any other interesting things to add on to this.\n","date":"27 August 2017","externalUrl":null,"permalink":"/screen-recording-on-the-server/","section":"Posts","summary":"Over the weekend, I’ve been experimenting whether if its possible to set up screen recording on a linux server. This is partly just out of curiosity but also, a little a bit of frustration. Imagine if you were in a position where you aim to assist people in recording their training sessions over on Google Hangouts but in order to do so, you would need to be around and your computer needs to be “sacrificed” in order to do the recording.\n","title":"Screen Recording on the Server","type":"posts"},{"content":"While attempting to play around with object ids via the rgoogleslides package, the main issue I got was to quickly understand which object id referred to which element on the slide?\nIt was possible to retrieve the list of objects on a page in a google slide but there was too much nested structures in the response of the google slides api in order to understand what was going on in the slide.\nIn order to simplify the process, I am experimenting with an abstraction on top of understanding this nested. This would immediately give immediate information such as the object ids of text boxes or object ids of tables.\nLet\u0026rsquo;s say we have the above slide as an example.\nIf we were to run the get_slide_page_properties function with the upcoming rgoogleslides version 0.3.0, it would return the following object.\nlibrary(rgoogleslides) rgoogleslides::authorize() # Slide id refers the id of the entire slide deck slide_id \u0026lt;- \u0026#34;aaaaaa-hidden-id-aaaaaaaa\u0026#34; # Slide page id refers to the specific slide within the slide deck slide_page_id \u0026lt;- \u0026#34;p\u0026#34; slide_data \u0026lt;- get_slide_page(slide_id, slide_page_id) slide_data # \u0026lt;SlidePage\u0026gt; # Public: # clone: function (deep = FALSE) # get_tables: function () # get_text_boxes: function () # initialize: function (slide_page_list_response) # raw_response: list slide_data$get_text_boxes # object_id text_content # 1 g1e4756099b_0_1 Test - Finding out the object ids of each element on this page\\n slide_data$get_tables # [[1]] # [[1]]$object_id # [1] \u0026#34;g1e4756099b_0_5\u0026#34; # # [[1]]$table # X1 X2 X3 X4 # 1 Test1\\n Test3\\n Test4\\n Test5\\n # 2 Test2\\n # 3 # 4 With a single glance, it is possible to know that object id g1e4756099b_0_1 refers to the text box and g1e4756099b_0_5. The following information is just data being parsed by an object which checks the data type of each of the object id as well as to retrieve all the data required in order to identify the different elements within the slide.\nWith that, we can then apply the following changes:\nDelete the text from the text box Insert new text into said text box library(rgoogleslides) rgoogleslides::authorize() # Slide id refers the id of the entire slide deck slide_id \u0026lt;- \u0026#34;aaaaaa-hidden-id-aaaaaaaa\u0026#34; request \u0026lt;- add_delete_text_request(object_id = \u0026#34;g1e4756099b_0_1\u0026#34;) commit_to_slides(slide_id, request) request2 \u0026lt;- add_insert_text_request(object_id = \u0026#34;g1e4756099b_0_1\u0026#34;, text = \u0026#39;Testtesttest\u0026#39;) commit_to_slides(slide_id, request2) I guess it is easy to image the result of the above code snippet; it would just replace the text box with the Testtesttest text.\nThis set of changes is set to come with version 0.3.0 of the rgoogleslides package which should be coming quite soon.\n","date":"19 June 2017","externalUrl":null,"permalink":"/matching-object-ids-in-elements-on-a-googleslide/","section":"Posts","summary":"While attempting to play around with object ids via the rgoogleslides package, the main issue I got was to quickly understand which object id referred to which element on the slide?\n","title":"Matching object ids in elements on a Googleslide","type":"posts"},{"content":"The rgoogleslides package is being upgraded with a quite a big change in methodogy. Refer to the following release notes for more detailed information.\nhttps://github.com/hairizuanbinnoorazman/rgoogleslides/releases/tag/v0.2.0-alpha\nThe previous design package suffer from several design flaws, some of which would be detailed below:\nWrapper functions that wrap internal builder functions. The concept behind this is nice as it lowers the barrier of entry of using the package. A quick survey of the possible users of the package mention about how they are more familiar with using ordinary R functions and that they wouldn\u0026rsquo;t want to fiddle with complex R objects which was why the wrapping functions were created.\nHowever, the wrapper functions would mean duplicate efforts as well as poor usage of API limits. Refer to the previous post and this might make it more difficult to maintain the package in the future.\nHence, the wrapper functions have been dropped. We would use R objects to hold all the information that would be needed to pass to Google Slides API.\nAt the same time, having wrapper functions could potentially lead to inefficient use of the Googleslides API. This was mentioned above. The API is restricted by the number of calls to the service but yet we are underloading each call to the service. Each call can potentially take in tens of update changes that is to be made to the slides but the wrapper functions would only send only a few at one time due to the way its designed.\nHandling of lists in the function make it difficult to validate that the right R object is being passed around to each of the functions. This was why several R6 objects were used; namely the GoogleSlidesRequest object as well as the PagePropertyElement object.\nThis version can be called via the following code:\n# Install initial version of rgoogleslides # Current version has been tagged has v0.2.0-alpha # Install devtools R package if you have not installed it yet install.packages(\u0026#34;devtools\u0026#34;) library(devtools) install_github(\u0026#34;hairizuanbinnoorazman/rgoogleslides\u0026#34;, ref=\u0026#34;v0.2.0-alpha\u0026#34;) ","date":"11 May 2017","externalUrl":null,"permalink":"/rgoogleslides-updated-to-v0.2.0-alpha/","section":"Posts","summary":"The rgoogleslides package is being upgraded with a quite a big change in methodogy. Refer to the following release notes for more detailed information.\nhttps://github.com/hairizuanbinnoorazman/rgoogleslides/releases/tag/v0.2.0-alpha\nThe previous design package suffer from several design flaws, some of which would be detailed below:\n","title":"Rgoogleslides Updated to v0.2.0-alpha","type":"posts"},{"content":"IMPORTANT:\nTHE FOLLOWING BLOG POST IS OUTDATED. THERE IS AN UPDATE TO GOOGLESLIDES API WHICH DISABLE USAGE OF GOOGLE DRIVE IMAGES. NOW ALL IMAGES HAS TO BE FROM PUBLIC SOURCES. THERE IS A FEATURE REQUEST TICKET CREATED TO ADD THIS FUNCTIONALITY BACK BUT THERE IS A HIGH LIKELIHOOD IT WONT BE BACK FOR A LONG TIME\nALSO, A NEW PUBLIC GOOGLEDRIVE PACKAGE IS AVAILABLE FOR USE - PLEASE USE THAT ONE INSTEAD FOR UPLOADING ANY ASSETS\nREFER TO THE FOLLOWING TICKET:\nhttps://github.com/hairizuanbinnoorazman/rgoogleslides/issues/28\nLet\u0026rsquo;s say you\u0026rsquo;re an analyst and you want to automate your workflow to send your analysis into Googleslides without much involvement from your part; what would you do?\nWell, its now possible by using a combination of the googledrive R package as well as the rgoogleslides package.\nWith the v0.2.0-alpha release, the rgoogleslides package now supports a function to retrieve an image from a user\u0026rsquo;s own Googledrive and puts it into the slides. (The Googleslides API only accept hosted images; it doesn\u0026rsquo;t accept an actual png or jpeg files)\nInstalling the googledrive and rgoogleslides packages. The 2 packages are not yet available cran but I intend to push it there soon. :D. But in the mean time, this is how you would install the packages:\ninstall.packages(\u0026#34;devtools\u0026#34;) library(devtools) # Install the rgoogleslides R package from the master branch devtools::install_github(\u0026#34;sparklineanalytics/rgoogleslides\u0026#34;, build_vignettes = TRUE) # Install the googledrive R package from the master branch devtools::install_github(\u0026#34;hairizuanbinnoorazman/googledrive\u0026#34;, build_vignettes = TRUE) Let\u0026rsquo;s see how we can get started:\n# Analysis on IRIS dataset library(ggplot2) library(googledrive) library(rgoogleslides) library(png) # Authorization functions googledrive::authorize() rgoogleslides::authorize() First, we\u0026rsquo;ll import all the required analysis. We will be performing an analysis on the iris dataset by using the ggplot2 R package. The ggplot2 R package would generate a graph for us to present our analysis but in this use case, we won\u0026rsquo;t be doing anything too complex.\nThe Authorization functions would be slightly different from how you would use R functions normally. In order to have some sort of common interface for using the packages, the authorization functions are named with the same name: authorize().\nHowever, this comes with its own problems where if you import both the googledrive and rgoogleslides library, the two functions would cause R to be confused on which one to use. Having the the :: in front of the function would ensure that the function of the right package is called. In english terms: Call the authorize function from the googledrive package.\nThe authorize function is similar to the RGA\u0026rsquo;s authorize function where you can input your own client id and client secret but that would be a blog post for another time.\n# Do up a quick plot on iris dataset first_plot \u0026lt;- qplot(iris$Sepal.Length, iris$Sepal.Width, color = iris$Species) ggsave(\u0026#34;first_plot.png\u0026#34;, first_plot) # Determine the dimensions of the image image \u0026lt;- png::readPNG(\u0026#34;first_plot.png\u0026#34;) dimension \u0026lt;- dim(image) image_width \u0026lt;- dimension[1]/8 # Calculate to your requirements image_height \u0026lt;- dimension[2]/8 # Calculate to your requirements # Upload image to Google drive id \u0026lt;- googledrive::upload_file(\u0026#34;first_plot.png\u0026#34;) # Retrieve the image_id allocated to the image by Google drive image_id \u0026lt;- id$id We would then create the plot that we need on the iris dataset and save it locally first before pushing it to be hosted on Google drive.\nAn important thing to note is to obtain the image width and image height. We would need to use those for calculating the position of the image on the slide. You will see this in action in the next code snippet.\nThe final portion is to save the image id into a variable so that we can reference it later.\n# Create a new googleslides presentation slide_id \u0026lt;- rgoogleslides::create_slides(\u0026#34;Test Analysis\u0026#34;) slide_details \u0026lt;- rgoogleslides::get_slides_properties(slide_id) # Obtain the slide page that the image is to be added to slide_page_id \u0026lt;- slide_details$slides$objectId # Get the position details of the element on the slide page_element \u0026lt;- rgoogleslides::aligned_page_element_property(slide_page_id, image_height = image_height, image_width = image_width) request \u0026lt;- rgoogleslides::add_create_image_request(url = image_id, page_element_property = page_element) response \u0026lt;- rgoogleslides::commit_to_slides(slide_id, request) This is where the magic would happen.\nWe would first create a blank slides presentation for this test and then retrieve the details of the slides. We would need to get the id of the page on the Googleslide which we will editing.\n(Note: I realize that normal conventions of how people talk about slides is kind of vague. Some call it deck; others call it slides; When attempting to one of the slide in the slides/deck, we would refer it as a slide which makes it really really confusing here.\nSo, rather than sticking to that, we would refer to a \u0026ldquo;slide\u0026rdquo; in the slides/deck as a page instead so that we can all differentiate and know what is being mentioned here)\nWe would create a page element which dictates on the position and transformation that we will impacting on the element that is being added to the page on the slide (in this case, it would be an image)\nHere, we will feed in the image id which it would then internally construct the required url for the Google Slides API to consume and present the image.\nThe final slides would kind of look like this:\nThere may be a bit of fine tuning and fixes that both packages require but this is a potential use case of how the packages would interoperate with each other.\nHere is to the full script for your convenience:\n# Analysis on IRIS dataset library(ggplot2) library(googledrive) library(rgoogleslides) library(png) # Authorization functions googledrive::authorize() rgoogleslides::authorize() # Do up a quick plot on iris dataset first_plot \u0026lt;- qplot(iris$Sepal.Length, iris$Sepal.Width, color = iris$Species) ggsave(\u0026#34;first_plot.png\u0026#34;, first_plot) # Determine the dimensions of the image image \u0026lt;- png::readPNG(\u0026#34;first_plot.png\u0026#34;) dimension \u0026lt;- dim(image) image_width \u0026lt;- dimension[1]/8 image_height \u0026lt;- dimension[2]/8 # Upload image to Google drive id \u0026lt;- googledrive::upload_file(\u0026#34;first_plot.png\u0026#34;) image_id \u0026lt;- id$id # Create a new googleslides presentation slide_id \u0026lt;- rgoogleslides::create_slides(\u0026#34;Test Analysis\u0026#34;) slide_details \u0026lt;- rgoogleslides::get_slides_properties(slide_id) # Obtain the slide page that the image is to be added to slide_page_id \u0026lt;- slide_details$slides$objectId # Get the position details of the element on the slide page_element \u0026lt;- rgoogleslides::aligned_page_element_property(slide_page_id, image_height = image_height, image_width = image_width) request \u0026lt;- rgoogleslides::add_create_image_request(url = image_id, page_element_property = page_element) response \u0026lt;- rgoogleslides::commit_to_slides(slide_id, request) There are still some weird issues where after a duration of time, the script will start throwing authentication errors but after refreshing the token and rerunning the script from top to bottom, the script becomes runnable once more. These fixes will be done up in the future.\nExpect more features and functions to come to both packages!! Check the progress with the following links:\nrgoogleslides: https://github.com/hairizuanbinnoorazman/rgoogleslides\ngoogledrive: https://github.com/hairizuanbinnoorazman/googledrive\n","date":"11 May 2017","externalUrl":null,"permalink":"/sending-ggplot-graphs-to-googleslides/","section":"Posts","summary":"IMPORTANT:\nTHE FOLLOWING BLOG POST IS OUTDATED. THERE IS AN UPDATE TO GOOGLESLIDES API WHICH DISABLE USAGE OF GOOGLE DRIVE IMAGES. NOW ALL IMAGES HAS TO BE FROM PUBLIC SOURCES. THERE IS A FEATURE REQUEST TICKET CREATED TO ADD THIS FUNCTIONALITY BACK BUT THERE IS A HIGH LIKELIHOOD IT WONT BE BACK FOR A LONG TIME\n","title":"Sending ggplot graphs to googleslides","type":"posts"},{"content":"In the initial draft of the rgoogleslides package, there were several wrapper functions that serve to immediately call the Google Slides API immediately after it is being used. Some of the examples are below:\nreplace_text create_shape create_table What is happening under the hood is that the function would invoke internal functions that would then first create an R list that would manage requests and add the request details into that list before immediately making the call to the Google Slides API. The idea behind this is to provide the common code that all the functions can use and to also prevent users from being too exposed to computing concepts of passing an \u0026ldquo;object\u0026rdquo; around etc. To summarize; the above wrapper functions are to simplify way the package by packaging the API in R functions.\nUnfortunately, this would mean that the API is not being utilized to its fullest. With the additional knowledge that we are limited to 40000 API calls a day. (Link) , we would need to ensure that we would reduce the number of calls as much as possible, which is why we would need a different way of calling it from R. We need a way to somehow batch the slide request into bigger requests; which would mean a complete restruturing of the Rgoogleslides package.\nIn the next upcoming release of the rgoogleslides package, the package would involve removing the wrapper functions and provide users access to the request builder functions. A blog post detailing this details would be released soon. Apologies for any breaking changes between the package version.\nIn the case where you are still dependent on the previous release, the code for that has been tagged accordingly and can still be installed via the following lines of code:\n# Install initial version of rgoogleslides # Initial version has been tagged has v0.1.0-alpha # Install devtools R package if you have not installed it yet install.packages(\u0026#34;devtools\u0026#34;) library(devtools) install_github(\u0026#34;hairizuanbinnoorazman/rgoogleslides\u0026#34;, ref=\u0026#34;v0.1.0-alpha\u0026#34;) ","date":"10 May 2017","externalUrl":null,"permalink":"/restructuring-rgoogleslides/","section":"Posts","summary":"In the initial draft of the rgoogleslides package, there were several wrapper functions that serve to immediately call the Google Slides API immediately after it is being used. Some of the examples are below:\n","title":"Restructuring Rgoogleslides","type":"posts"},{"content":"RGA is one of the packages I often use in my line of work and I use it to extract data from Google Analytics Platform into R. From there, I can easily utilize data manipulation packages such as dplyr and tidyr to get the results I would need before pushing those results back to Googlesheets via the googlesheets R package.\nHowever, one thing I found that could cause issues is the fact that packages such RGA and googlesheets R packages actually make use of your own credentials. It\u0026rsquo;s convenient, but if there was a case where you leave the team/company then you can literally bid farewell to that script. The script will start to face authentication issues which render it useless.\nOne solution to this is to use a generic account which would be given access to all these accounts. This generic account would belong to the company and even if the person moves teams or companies, it would still leave the script working.\nI wouldn\u0026rsquo;t recommend that though; It feels like you\u0026rsquo;re putting all your eggs in one basket solution. If that method was used, there would be a need do password maintenance on the account. There is also a possibility that users who have access to that account could use it for malice purposes and it would be hard to catch the person responsible, seeing that action will be performed by the \u0026ldquo;common\u0026rdquo; account.\nAnother solution is to actually use a service account. Google Cloud provides this so that supports server-to-server interactions such as those between a web application and a Google service. Link. This completely fits into our scenario where we actually have a script (which is a machine) talk to Google Analytics (which is also another machine)\nHere\u0026rsquo;s how to set it up:\nGet a Google Cloud Service Account Register that account into Google Analytics Access Google Analytics data via RGA Creating a Google Cloud Service Account # Go to the following link to access Google Cloud . If you haven\u0026rsquo;t registered for it before, just follow the drill of signing up for the service before you\u0026rsquo;ll be able to access it. http://console.cloud.google.com/ You do not need to have billing enabled for this. We are only going to create a service account which happens to be free. Click on the top left hand corner to access the menu Click on API manager On this page, you can choose to support your own app credentials in order to support your RGA script. Select to enable to analytics API (this is the v3 of the Google Analytics API which RGA was still at during the time of writing this post) Click on the credentials (see above image). After which, you should see the next image You will need to follow the next set of instructions to get your service account Go to Oauth Consent Screen and fill up your product name (This seems to be only one that is necessary for now). You will need to do this or else you will not be allowed to create a service account. Go back to the credentials selection (similar to the image above) and choose to make a new service account. You should see the following screen. You do not need to give your account any Google Cloud Compute Role but do give it a smart enough name. We would go with the default JSON key selection. After you do that, you can then create Create button. At this point, you will automatically download a json file (service account key) - DO NOT EVER LOSE IT. BUT IF YOU DO, JUST MAKE ANOTHER ONE. Take note of the Service account email (@\u0026lt;Your project).iam.gserviceaccount.com That kind of concludes on the making of a Google Cloud Service Account Adding Service Account Credentials into Google Analytics # Ensure that you have an account to be able to edit permissions of the Google Analytics accounts and properties. If you do have that permission level, you would be able to just add users as usual to Google Analytics. Just follow the instructions here: https://support.google.com/analytics/answer/2884495?hl=en\nIf you do not do this step, the service account will not have access to the Google Analytics data and you would be unable to utilize it in your script.\nAccess Google Analytics using the service account via R # And now, we finally can get to the code section of the Google Analytics data extraction!!\nRGA doesn\u0026rsquo;t really have service account capability. However, that wouldn\u0026rsquo;t deter us from using this; internally, the package uses httr package for authentication purposes. httr does both the user authenticated way of doing things as well as the service account way of doing things. Only that the service account way of authenticating the service is not really exposed to you, the user.\nlibrary(jsonlite) library(httr) library(RGA) # Getting the token for future access to the account endpoints \u0026lt;- httr::oauth_endpoints(\u0026#39;google\u0026#39;) secrets \u0026lt;- jsonlite::fromJSON(\u0026#34;./location-of-your-service-account-file.json\u0026#34;) scope \u0026lt;- \u0026#39;https://www.googleapis.com/auth/analytics\u0026#39; token \u0026lt;- httr::oauth_service_token(endpoints, secrets, scope) # Utilizing the token to access the Google Analytics data random_view_id \u0026lt;- \u0026#39;2134151\u0026#39; RGA::authorize(token=token) RGA::get_ga(random_view_id) The code snippet only has 2 parts, the top portion is utilizing the httr package to authenticate the service account which would then provide the service account temporary access (token) to access Google Analytics data. While authenticating, you will not be seeing any Google Authentication Screen. Instead, the token value will just be created and assigned silently. This would be ideal if you are using this on a server to run some daily/weekly/monthly process of extracting and processing data on the server.\nThe bottom portion is the part of using the the token to actually hit the Google Analytics service to retrieve the data accordingly.\nThat\u0026rsquo;s it for this tutorial in order to access the service account for the service. Happy trying!\n","date":"21 March 2017","externalUrl":null,"permalink":"/using-service-accounts-with-rga/","section":"Posts","summary":"RGA is one of the packages I often use in my line of work and I use it to extract data from Google Analytics Platform into R. From there, I can easily utilize data manipulation packages such as dplyr and tidyr to get the results I would need before pushing those results back to Googlesheets via the googlesheets R package.\n","title":"Using Service Accounts with RGA","type":"posts"},{"content":"Setting up microservices is really quite hard. Its not just about the technology but it involves culture and habits that the team have in order to have the discipline and also the ease in order to create services that would scale well. One would have to kind of switch the thinking behind all the best practices and theory. I kind of summarized some points that are found interesting about how some of the companies deal with and manage microservices.\nImportant: These are not best practises but it is more of tips of how to get microservices working.\nUse some sort of cloud/virtualization technology (although I say it is not really just the technology but this is like the minimum needed). Why do you need this? It is so that you can easily scale your individual components. Imagine if you were using the traditional way of buying hardware in order to scale up? The moment you attempted to scale, you will be told that you will be told that you would have to wait a couple of months just to get that hardware up and running. Some components one would need in order to get the whole thing up and running would be: VMs or Docker containers to host the compute services Event bus - The one component that would handle all the messages generated by services. The events that occur would distribute among all the services in a pubsub way of doing things. Centralized logging - For convenience in trying to understand what is happening in each service within the microservices world. Services/apps in the microservices architecture has to be small and nimble and has to ensure that it should only do 1 thing and it should do it well. It should be coded such that the whole model of that services can be maintained in a single developer\u0026rsquo;s head after just reading it and it is maintainable that way; ideally. But with these small codebases for each service, it should be easily replaceable. If you try to scale a component and it starts straining beyond a certain point? Just kill it. If you need a library of some code base to be distributed among your service, just don\u0026rsquo;t do that. Instead, one you could do is to create a distributed service instead and let that handle all that load/work Each service should just already just expect errors. Network fails, Other components will fail. Only thing you would need to just ensure that the service would be able to withstand all that. Centralize your logging with a distributed service. If you have like tens/hundreds of microservices and each one runs like 4-5 copies of itself, you would need to be crazy to go into each and every single one of those service to get the logs. Battle between async and sync services. Some of the speakers were all for async microservices as that seems to be the natural tendency of the microservices world. However, async services are not easily understood by developers; developers tend to think more of the synchronous way of doing things where processes occur in some sort orderly linear fashion. But whatever it is, one should just stick with paradigm of communicating any events that occur back to event bus. References\nHere are some of the videos as reference for all the points that I mentioned above:\nDealing with microservices for the frontend. Most talks out there actually talk about how microservices handle back end services but this is one of the few that gave a few ideas of how to do front end in the microservices world. https://www.youtube.com/watch?v=wIID1wHZWpg The philosophy behind how microservices came about. Kind of interesting on how he found inspiration behind the microservice idea by looking into Biology and emulating that design. https://www.youtube.com/watch?v=-UKEPd2ipEk A pretty nice summary of some of the challenges that one would face when doing up microservices. https://www.youtube.com/watch?v=Ztjk4RGc4gI ","date":"10 February 2017","externalUrl":null,"permalink":"/tips-on-microservices/","section":"Posts","summary":"Setting up microservices is really quite hard. Its not just about the technology but it involves culture and habits that the team have in order to have the discipline and also the ease in order to create services that would scale well. One would have to kind of switch the thinking behind all the best practices and theory. I kind of summarized some points that are found interesting about how some of the companies deal with and manage microservices.\n","title":"Tips on Microservices","type":"posts"},{"content":"Although it is often mentioned in many of the online tutorials and wikis that R is a Object Oriented language, the code examples on the web definitely don\u0026rsquo;t show too many hints of that. Many of the code tutorials and code examples do not showcase such language features but instead mainly using functions to get things done.\nThis is partly because the main target audience of R users, tend to be the technologically savvy analyst who just need a quick tool to power through their data manipulation and data analysis work. They don\u0026rsquo;t specifically want to handle with all the computer science theories such as classes and objects etc.\nMost tasks can actually be done with just functions and loops but after a while you would realize that your code would actually be much better if relevant functions/settings are actually bunched into a single entity or in this case, object.\nBut let\u0026rsquo;s say if you are curious of this language feature, where can you find examples of how it is applied?\nThere are actually already hints in some of the common packages. If you used packages such as RGA or bigrquery, the package generates a token object which it uses for authorization with Google Services. The two packages are powered by the httr package.\nLook at the source code under the oauth-token.R file, you would see the following snippet of code. This code declares the token file that does the heavy lifting for you. For the average user, you wouldn\u0026rsquo;t feel the complexity. You would just see and feel the magic of how easy it is to connect to the various internet services out there.\nhttps://github.com/r-lib/httr/blob/master/R/oauth-token.r\nSo from this code, we can try to research further on the object oriented feature by exploring the R6Class function from the R6 package. The links below are some of the\nAn awesome introduction to Object Oriented features in R https://cran.r-project.org/web/packages/R6/vignettes/Introduction.html\nSome things to take note: https://cran.r-project.org/web/packages/R6/vignettes/Portable.html\nBut let\u0026rsquo;s not stop there, let\u0026rsquo;s have a working class here!\nlibrary(R6) # Declaring the account class Account \u0026lt;- R6Class(\u0026#34;Account\u0026#34;, public = list( accountNumber = NULL, balance = NULL, initialize = function(accountNumber = NA, balance = NA) { self$accountNumber = accountNumber self$balance = balance }, getAccountNumber = function() { return(self$accountNumber) }, getBalance = function(){ return(self$balance) }, setBalance = function(balance){ self$balance = balance }, credit = function(amount){ self$balance = self$balance - amount }, debit = function(amount){ self$balance = self$balance + amount }, print = function(){ return(paste(\u0026#34;Account Number:\u0026#34;, self$accountNumber, \u0026#34;, Balance:\u0026#34;, self$balance)) } ) ) # Instantiating a new Account class account \u0026lt;- Account$new(1234, 12) # Get the current balance of the account # Answer: 12 account$getBalance() # Get credit out of the account account$credit(3) # Get debit into the account account$debit(4) # Get the current balance of the account # Answer: 13 account$getBalance() By looking at the code, you can see how it feels \u0026ldquo;packaged\u0026rdquo;. Anything that requires the manipulation of the account is handled via the account object. Functions that handle the manipulation as well as the current state is all stored within it.\nAlthough this is a pretty simple example, you can actually easily built on this to create much complex but we\u0026rsquo;ll explore that in another blog post.\n","date":"9 January 2017","externalUrl":null,"permalink":"/r-is-an-object-oriented-language/","section":"Posts","summary":"Although it is often mentioned in many of the online tutorials and wikis that R is a Object Oriented language, the code examples on the web definitely don’t show too many hints of that. Many of the code tutorials and code examples do not showcase such language features but instead mainly using functions to get things done.\n","title":"R is an object oriented language?","type":"posts"},{"content":" Some time late last year, the Googleslides API was announced by Google. This was a pretty exciting piece of news; one that took so long to come.\nSlides API Announced By Google\nWith the API now available, everyone who wanted to automate the \u0026ldquo;presentation slide\u0026rdquo; work could now effectively make such slides with scripts, thereby removing the last barrier when it comes to presenting data to people. If you had some kind of script that already does a lot of the heavy lifting of manipulating the data from the various data sources into relevant tables, then all you need is to add a few more lines to then be able to send the data straight to slides. (YES!! No more mundane work of changing numbers in monthly slides)\nHowever, it is kind of unfortunate the Google API does not cover all languages (it covers the mainstream languages such as Java and Python though). And R is not included in those languages, which is kind of unfortunate, seeing that R is one of those languages that would largely benefit from having a package that does this magic.\nSo late last year, I had some fortune in having some free time to put together a small R package that talks to the Googleslides API and here it is!!\nhttps://github.com/hairizuanbinnoorazman/rgoogleslides\nIt is not on CRAN yet although I plan to tidy it up and move it there as soon as I have more free time to work on this further. (Update: It\u0026rsquo;s now on CRAN: https://cran.r-project.org/web/packages/rgoogleslides/index.html)\nSo before we end this post, let me quickly go through one of the more major functions within this package.\nLet\u0026rsquo;s say we have a monthly presentation slide of 2 slides (This is just an example)\nTitle Slide\nSecond Slide\nSo, for each month, you would need to edit the month and year of the slides. It may not be a problem for just 2 slides but imagine if you have a huge slide deck which does monthly report comparisons etc. It will just become a mundane task to update all the information for the slides.\nHence, let\u0026rsquo;s alter the slides this way. Alter the June 2016 to { month-year }\nTitle Slide\nSecond Slide\nWe can now use the Googleslides R package to update the slides. Get the slide id from the url:\nhttps://docs.google.com/presentation/d/1EtDqjWDXXXXXBYVdAJo/edit#slide=id.g1af69dd764_0_63\nNext, download the Googleslides package by following the instruction on Github\ninstall.packages(\u0026#34;devtools\u0026#34;) library(devtools) devtools::install_github(\u0026#34;hairizuanbinnoorazman/googleslides\u0026#34;, build_vignettes = TRUE) Then, run the following in RStudio:\nauthorize() replace_all_text(\u0026#34;1EtDqjWDXXXXXBYVdAJo\u0026#34;, \u0026#34;June 2016\u0026#34;, \u0026#34;{month-year}\u0026#34;) Congratulations! You have done up the initial R script on manipulating GoogleSlides using R.\nThe following method of doing things is inspired from the following tutorial in the Googleslides API documentation: https://developers.google.com/slides/how-tos/merge#example\nThe package is still under heavy development and more and more features are being added to make a more impactful package. So, if you found any bugs or if you have any feature requests, just add it in the issues of the repository and I will see if I can absorb the suggestion and implement those features.\n","date":"3 January 2017","externalUrl":null,"permalink":"/googleslides-r-package-in-the-making/","section":"Posts","summary":" Some time late last year, the Googleslides API was announced by Google. This was a pretty exciting piece of news; one that took so long to come.\n","title":"Googleslides R Package in the Making","type":"posts"},{"content":"","date":"1 December 2016","externalUrl":null,"permalink":"/categories/data-science/","section":"Article Categories","summary":"","title":"Data Science","type":"categories"},{"content":"","date":"1 December 2016","externalUrl":null,"permalink":"/tags/data-science/","section":"Technology Tags","summary":"","title":"Data Science","type":"tags"},{"content":"Domino released a pretty comprehensive paper on data science maturity models in organizations.\nLink to the paper: https://www.dominodatalab.com/p/data-science-maturity-model-ungated\nData Science is not something that a company can just immediately buy from the market (hiring and tooling); its much more than that. It requires time for the organization to get used to having the Data Science team and the practises that the team preach. However, having a guy from C-level who believes in the impact of Data Science would definitely help in having the rest of the organization engage the rest of the company.\nEven if a company preaches analytics and try their best to get the tools and talent in place, many things still prevent the company from reaching their full potential.\nWiki and proper file sharing services are set up in the organization. Employees are encouraged to use to store their files and analysis etc. Some of the files actually automate huge amounts of work by moving data across various excel files and consolidating all that data into a single file for easier analysis. However, even with those tools in place, there are times where employees keep recreating the analysis/automation excel files from scratch over and over again. Some possible reasons on why this could have happen: There is no proper process/procedure on how to share those files/processes on the platform. The files are generated not labelled with metadata and just relying on file name doesn\u0026rsquo;t help in trying to guess what the file does Excel files for automations are usually single purpose. They are usually very customized to do a certain set of functions and to do it very well. Due to that, there is very few people who would be around who would be available to maintain moving forward (you would usually need to know a huge amount of business knowledge in order to understand what is happening in the code) New people on team are not pointed out on where such resources are. Most of the time, there isn\u0026rsquo;t a guide for teams to point out to those files and to inform users on what those files do. It\u0026rsquo;s kind of interesting to see how such scenarios crop up but it would be more interesting to wonder if it\u0026rsquo;s possible to even solve such problems. (No, your solution shouldn\u0026rsquo;t be to just use Excel; there are plenty of people in the corporate world who swear by Excel and believe that it could solve any problem - believe me, I was there\u0026hellip; The lack of version, the awkward way to automate it via VBA scripts)\n","date":"1 December 2016","externalUrl":null,"permalink":"/data-science-maturity-in-an-organization/","section":"Posts","summary":"Domino released a pretty comprehensive paper on data science maturity models in organizations.\nLink to the paper: https://www.dominodatalab.com/p/data-science-maturity-model-ungated\nData Science is not something that a company can just immediately buy from the market (hiring and tooling); its much more than that. It requires time for the organization to get used to having the Data Science team and the practises that the team preach. However, having a guy from C-level who believes in the impact of Data Science would definitely help in having the rest of the organization engage the rest of the company.\n","title":"Data Science Maturity in an Organization","type":"posts"},{"content":"","date":"19 November 2016","externalUrl":null,"permalink":"/categories/google-slides/","section":"Article Categories","summary":"","title":"Google Slides","type":"categories"},{"content":"","date":"19 November 2016","externalUrl":null,"permalink":"/tags/google-slides/","section":"Technology Tags","summary":"","title":"Google Slides","type":"tags"},{"content":"Earlier this year, Google announced a couple of new APIs for the set of Google Products in its productivity and office suite, namely the Google Slides API\nhttps://gsuiteupdates.googleblog.com/2016/05/new-ways-to-keep-data-flowing-between.html\nIt has been a long time but ever since the announcement in the Google IO earlier this year, I\u0026rsquo;ve been anticipating for the arrival of the API and it\u0026rsquo;s already here!!\nGoogle Slides API homepage https://developers.google.com/slides/\nWith the following API in place, the workflow for most of data analysis work (at least those who mainly use the Google Apps) can be crafted end to end. In my case, since I mostly use R for my data manipulation work, I can create a workflow that looks something like this:\nExtracting data from Google Analytics or Google BigQuery platforms using the R packages, RGA and bigrquery. Manipulate and summarize the data using dplyr, tidyr and other packages. Possible run machine learning models using caret or its underlying ML packages. Afterwhich, we would then put the learnings into a Google Slides template via the Google Slides API. The slides would be ready to be used for presenting to the audience with less effort as compared to the usual\nAlthough having the new API is great and all; unfortunately there is no R package for the slides API and hence, I am already writing one during my spare time.\nJust keep a look out on this blog on future news of the googleslides R package!\n","date":"19 November 2016","externalUrl":null,"permalink":"/slides-api-announced-by-google/","section":"Posts","summary":"Earlier this year, Google announced a couple of new APIs for the set of Google Products in its productivity and office suite, namely the Google Slides API\nhttps://gsuiteupdates.googleblog.com/2016/05/new-ways-to-keep-data-flowing-between.html\nIt has been a long time but ever since the announcement in the Google IO earlier this year, I’ve been anticipating for the arrival of the API and it’s already here!!\n","title":"Slides API announced by Google!","type":"posts"},{"content":"Categories group articles into broader areas such as DevOps, software development, automation, and personal projects.\n","externalUrl":null,"permalink":"/categories/","section":"Article Categories","summary":"Categories group articles into broader areas such as DevOps, software development, automation, and personal projects.\n","title":"Article Categories","type":"categories"},{"content":"Practical implementation notes, experiments, and lessons learned from building and operating software.\n","externalUrl":null,"permalink":"/","section":"Experiment, Fail, Learn, Repeat","summary":"Practical implementation notes, experiments, and lessons learned from building and operating software.\n","title":"Experiment, Fail, Learn, Repeat","type":"page"},{"content":"","externalUrl":null,"permalink":"/series/","section":"Series","summary":"","title":"Series","type":"series"},{"content":"Use tags to find focused articles about technologies such as Kubernetes, Golang, Google Cloud, Docker, Helm, and Electron.\n","externalUrl":null,"permalink":"/tags/","section":"Technology Tags","summary":"Use tags to find focused articles about technologies such as Kubernetes, Golang, Google Cloud, Docker, Helm, and Electron.\n","title":"Technology Tags","type":"tags"}]