<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Zero Down Time]]></title><description><![CDATA[Strategies and Techniques of Providing High QoS in Distributed Systems]]></description><link>https://www.zerodowntime.dev</link><image><url>https://substackcdn.com/image/fetch/$s_!_jP7!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2df01723-fe70-42d2-a39d-e24206330abe_667x667.png</url><title>Zero Down Time</title><link>https://www.zerodowntime.dev</link></image><generator>Substack</generator><lastBuildDate>Thu, 16 Apr 2026 13:55:09 GMT</lastBuildDate><atom:link href="https://www.zerodowntime.dev/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Nobel Khandaker]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[0downtime@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[0downtime@substack.com]]></itunes:email><itunes:name><![CDATA[Nobel Khandaker]]></itunes:name></itunes:owner><itunes:author><![CDATA[Nobel Khandaker]]></itunes:author><googleplay:owner><![CDATA[0downtime@substack.com]]></googleplay:owner><googleplay:email><![CDATA[0downtime@substack.com]]></googleplay:email><googleplay:author><![CDATA[Nobel Khandaker]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[On the Shoulders of Giants]]></title><description><![CDATA[Usefulness of LLMs and Agentic Coding in software engineering]]></description><link>https://www.zerodowntime.dev/p/on-the-shoulders-of-giants</link><guid isPermaLink="false">https://www.zerodowntime.dev/p/on-the-shoulders-of-giants</guid><dc:creator><![CDATA[Nobel Khandaker]]></dc:creator><pubDate>Sat, 13 Sep 2025 15:14:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!m8Uf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d4f0273-6f56-43d8-911b-4dcf844890a1_2000x2000.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!m8Uf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d4f0273-6f56-43d8-911b-4dcf844890a1_2000x2000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!m8Uf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d4f0273-6f56-43d8-911b-4dcf844890a1_2000x2000.png 424w, https://substackcdn.com/image/fetch/$s_!m8Uf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d4f0273-6f56-43d8-911b-4dcf844890a1_2000x2000.png 848w, https://substackcdn.com/image/fetch/$s_!m8Uf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d4f0273-6f56-43d8-911b-4dcf844890a1_2000x2000.png 1272w, https://substackcdn.com/image/fetch/$s_!m8Uf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d4f0273-6f56-43d8-911b-4dcf844890a1_2000x2000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!m8Uf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d4f0273-6f56-43d8-911b-4dcf844890a1_2000x2000.png" width="240" height="240" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3d4f0273-6f56-43d8-911b-4dcf844890a1_2000x2000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:240,&quot;bytes&quot;:145420,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://0downtime.substack.com/i/173482947?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d4f0273-6f56-43d8-911b-4dcf844890a1_2000x2000.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!m8Uf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d4f0273-6f56-43d8-911b-4dcf844890a1_2000x2000.png 424w, https://substackcdn.com/image/fetch/$s_!m8Uf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d4f0273-6f56-43d8-911b-4dcf844890a1_2000x2000.png 848w, https://substackcdn.com/image/fetch/$s_!m8Uf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d4f0273-6f56-43d8-911b-4dcf844890a1_2000x2000.png 1272w, https://substackcdn.com/image/fetch/$s_!m8Uf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d4f0273-6f56-43d8-911b-4dcf844890a1_2000x2000.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>Isaac Newton once quoted &#8220;If I have seen further, it is by standing on the shoulders of giants&#8221;.  LLM and agentic coding technologies are the giants that are now roaming in our software engineering domain.  Use of LLMs and agentic coding tools to improve software engineering is a strongly debated subject with plenty of strong arguments on both sides.  For curious readers, there are plenty of examples of <a href="https://prismic.io/blog/claude-code">successes</a>, <a href="https://lore.kernel.org/all/CAHk-=wjamixjqNwrr4+UEAwitMOd6Y8-_9p4oUZdcjrv7fsayQ@mail.gmail.com/">failures</a>, and <a href="https://www.thoughtworks.com/en-us/insights/blog/generative-ai/claude-code-codeconcise-experiment">mixed</a> results are available in the blogs and online discussions.</p><p>  Our engineering team has been using agentic coding tools for the last six months to streamline the day-to-day software engineering tasks, e.g., code review, root-cause analysis of production incidents, writing unit tests, onboarding new team members.  We have improved the sprint velocity, written better code with higher code coverage, and were able to solve production incidents quickly.  Here is a collection of the tips we have found useful and use cases that showcase the usefulness of agentic coding in writing and maintaining software.  The Claude code commands used in these use cases can be found in this github <a href="https://github.com/nobelk/claude-commands">repo</a>. </p><div><hr></div><h3>Tips to help with Agentic Coding</h3><ul><li><p>If you do not have a good grasp of the language and the framework in use, you will likely <em>not</em> be effective or efficient with the agentic coding tools.</p></li><li><p>Always review all changes made by the agent - you are the owner.</p></li><li><p>Direct the agent with very specific and detail instructions/prompts</p></li><li><p>Create a feedback loop for the agentic tool: Analyze &#8594; make changes &#8594; write tests to verify the changes.</p></li><li><p>For complex tasks and changes, ask the agent to think hard and justify its solution.</p></li><li><p>Watch out for the cost (use a cheaper model) for long-running tasks</p></li></ul><div><hr></div><h3>Use case 1: Reviewing Pull Requests</h3><pre><code>def convert_data(df: DataFrame) - &gt; bytes:
    buffer = bytearray(b"")
    for row in df._data:
      row_dump = dumps(dict(zip(self._columns, row)))
      buffer.extend(f "{row_dump}\n".encode("utf-8"))
    return gzip.compress(bytes(buffer))

session = aioboto3.Session()

async with session.client(
    "s3",
    region_name = "ap-east-1",
    config = get_aioboto_config(),
  ) as s3_client:
  await s3_client.put_object(
    Body = convert_data(self),
    Bucket = bucket,
    Key = key,
  )</code></pre><h4>Agent recommendations</h4><ul><li><p>Reuse the session and possibly use a connection pooling approach</p></li><li><p>Improve the performance of <code>convert_data</code> function: </p><ul><li><p><em>eliminate</em> 1) multiple memory allocations, 2) encoding, 3) buffer extension and manipulations</p></li><li><p>add a shortcircuit</p></li></ul></li></ul><pre><code>   if not df._data:
       return gzip.compress(b"")
   columns = df._columns
   json_lines = [dumps(dict(zip(columns, row))) for row in df._data]
   res = "\n".join(json_lines) + "\n"
   return gzip.compress(res.encode("utf-8"))</code></pre><h3>Use case 2: Root-cause analysis of a production incident</h3><p>We have been experiencing some type of memory leak in our heavily-used Java webservice running on EKS in AWS.  The symptoms of the leak were java connection timeouts to the storage services.  The web service is designed using Java Spring and ORM.  By enabling <code>spring.datasource.hikari.leak-detection-threshold </code>in the deployment, we were able to collect a detailed stack trace of the error in the production.  </p><h4>Exceptions</h4><pre><code>LOGGING_FORMAT: json
   caller_class_name: com.zaxxer.hikari.pool.ProxyLeakTask
   caller_file_name: ProxyLeakTask.java
   caller_line_number: 84
   caller_method_name: run
   level: WARN
   level_value: 30000
   logger_name: com.zaxxer.hikari.pool.ProxyLeakTask
   message: Connection leak detection triggered for org.postgresql.jdbc.PgConnection@123456c on thread http-nio-8080-exec-5, stack trace follows
   stack_trace: java.lang.Exception: Apparent connection leak detected
&#9;at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:128)
&#9;at org.hibernate.engine.jdbc.connections.internal.ConnectionProviderImpl.getConnection(ConnectionProviderImpl.java:122)
&#9;at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38)
...</code></pre><h4>Agent recommendations</h4><p>Once the exception details and the stack trace is fed into the Claude code (Opus 4.1) fixed the issue:</p><ul><li><p> Identified the root cause of the issue (using transactions for S3 read/write operations)</p></li><li><p>Performed the code change - using transactions only for DB access and removing it for S3 access and added unit tests to verify the changes</p></li><li><p>Ran the unit tests and verified its change</p></li></ul><h4>Cost</h4><pre><code>Total cost:            $65.09
    Total duration (API):  25m 28.1s
    Total duration (wall): 17h 32m 53.9s
    Total code changes:    1012 lines added, 36 lines removed
Usage by model:
claude-3-5-haiku:  70.5k input, 3.8k output, 0 cache read, 0 cache write
claude-opus-4-1:  1.2k input, 48.6k output, 23.4m cache read, 1.3m cache write
claude-sonnet:  168 input, 7.7k output, 2.1m cache read, 164.0k cache write</code></pre><h4></h4>]]></content:encoded></item><item><title><![CDATA[Ring-Based Deployments and Testing]]></title><description><![CDATA[Eliminate or reduce regressions and downtimes]]></description><link>https://www.zerodowntime.dev/p/ring-based-deployments-and-testing</link><guid isPermaLink="false">https://www.zerodowntime.dev/p/ring-based-deployments-and-testing</guid><dc:creator><![CDATA[Nobel Khandaker]]></dc:creator><pubDate>Sat, 14 Jun 2025 02:30:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_jP7!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2df01723-fe70-42d2-a39d-e24206330abe_667x667.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There is a continuous struggle happening in today's software development landscape. On one side are the developers, delivery managers, business owners who would like to make their work (product features) available to the users quickly and on the other hand are the QA engineers, the devops engineers, who would like to take some time to test and ensure that we avoid regressions and service outages at all cost while we deliver those new features.</p><p>Both sides have valid points, delivering features quickly seems to be a key component of capturing market share on the other hand, service outage causes the users real pain and translates into revenue loss, and in the worst case, loss of market share or user base.</p><p>Well, how about we hire an army of amazing engineers who write flawless code with 100% unit test coverage? - we can just push that code to production quickly, right?, Wrong.  Testing (unit, manual, etc.) can only ensure the presence of bugs, not the absence</p><p><em>During a service incident, several engineers in one of my previous teams had to spend days debugging an issue caused by <strong>a single instance</strong> of case-sensitive string comparison.</em></p><h2><em>Ring-based deployments</em></h2><p>Over time, all software components increase its interdependencies with other components within the architectures. When we deploy a sizable change in a component or a new component, the probability of the introduction of bugs increases. These could be caused by logical errors, by faulty code, and or integration issues. Given this high probability, there needs to be a way to mitigate any negative impact on the customer bases while we deploy code to the production environment. </p><p>Ring-wise continuous deployment (CI) is one such method to mitigate the risk of customer impact while achieving high velocity of feature delivery.</p><ul><li><p>Divide all environments where the software component is available into several rings of availability. </p></li><li><p>Starting from the developer's machine (Ring 0), to the final production environment (Ring 3). <em><strong>Continually deploy and test software features from one ring to the next until it is available to all users</strong></em>. </p></li></ul><p>The exact implementation of the rings depends on the type of software, the team, and the target user base. A sample implementation:</p><ul><li><p>Ring 0 Developer's machine</p></li><li><p>Ring 1 Staging environment</p></li><li><p>Ring 2 Production preview or Beta Customers</p></li><li><p>Ring 3 Production or Real Customers</p></li></ul><p>The key point to remember is:</p><p><em>The infrastructure and the deployed code in a ring should closely (as closely as possible) represent the ring# after it</em></p><p>The underlying need of deploying software w/o causing disruptions in the production environment arises from the customer-first/user-first thinking. In that case, the tech, product, and operations teams all work towards achieving the optimal experience for their customers. The additional work for ring-wise deployment and testing is part of that cost.</p>]]></content:encoded></item><item><title><![CDATA[Here Be Dragons]]></title><description><![CDATA[Hidden Risks of Software Migration]]></description><link>https://www.zerodowntime.dev/p/here-be-dragons</link><guid isPermaLink="false">https://www.zerodowntime.dev/p/here-be-dragons</guid><dc:creator><![CDATA[Nobel Khandaker]]></dc:creator><pubDate>Sat, 14 Jun 2025 02:08:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!KogA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9370b14f-73c0-4721-82ef-93226d731857_1674x826.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KogA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9370b14f-73c0-4721-82ef-93226d731857_1674x826.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KogA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9370b14f-73c0-4721-82ef-93226d731857_1674x826.png 424w, https://substackcdn.com/image/fetch/$s_!KogA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9370b14f-73c0-4721-82ef-93226d731857_1674x826.png 848w, https://substackcdn.com/image/fetch/$s_!KogA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9370b14f-73c0-4721-82ef-93226d731857_1674x826.png 1272w, https://substackcdn.com/image/fetch/$s_!KogA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9370b14f-73c0-4721-82ef-93226d731857_1674x826.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KogA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9370b14f-73c0-4721-82ef-93226d731857_1674x826.png" width="728" height="359" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9370b14f-73c0-4721-82ef-93226d731857_1674x826.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:718,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:3058266,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://0downtime.substack.com/i/158626526?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9370b14f-73c0-4721-82ef-93226d731857_1674x826.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KogA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9370b14f-73c0-4721-82ef-93226d731857_1674x826.png 424w, https://substackcdn.com/image/fetch/$s_!KogA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9370b14f-73c0-4721-82ef-93226d731857_1674x826.png 848w, https://substackcdn.com/image/fetch/$s_!KogA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9370b14f-73c0-4721-82ef-93226d731857_1674x826.png 1272w, https://substackcdn.com/image/fetch/$s_!KogA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9370b14f-73c0-4721-82ef-93226d731857_1674x826.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Every few years, the technology innovations continue to change the way we write, use, and maintain software solutions. In the last decade, the rise of AI, cloud computing, and open source technology solutions completely transformed the information technology. Software can now be written with less effort, deployed without building any physical infrastructure, and delivered to millions of users with relatively lesser effort.</p><p>The ever changing technology landscape continually obligates the technology professionals, to keep up with the pace and modernize our existing software and services. An organization&#8217;s need to change its software and services could be driven by its business need improving economy, create new product, etc.. The need could also be a matter of survival for the organization as the underlying software or hardware technology ages toward sunset.</p><p><code>Migration is the process through which an organization updates the code, data, or infrastructure.</code></p><p>Migration could be as simple as upgrading the database to a more recent version or as complex as moving the infrastructure from on premise datacenter to public cloud such as AWS.  For complex scenarios, completing the migration without having any negative impact to the ongoing business or operation is by no means, an easy affair. The cost and effort often puts enormous strain on the profitability of an organization, disrupts their daily operations, and in the worst case end could up destroying the organization. There are three main sources of the high risk and challenges that are associated with the migrations:</p><h4>Techical Debt</h4><p>Martin Fowler defines <strong>cruft</strong> as the &#8220;<em>deficiencies in internal quality that make it harder than it would ideally be to modify and extend the system further</em>&#8220;.  Ward Cunningham describes Technical debt as a mindset of thinking <strong>cruft</strong> as a financial liability that needs to be paid over time. Examples of technical debt could be lack of unit tests, missing documentation, customized code that can only run on a specific piece of hardware, or a bloated database that have not had a cleaning done for a long time and cannot be backed up.</p><p><em>Case study:  I have worked in a migration project where the technical debt was not calculated accurately ahead of time.  This caused the project to take three times the time and cost and resulted in the technical lead leaving the organization.</em></p><h4>The Three Cs - communication, collaboration, and coordination</h4><p>Unless the engineering team is in a small flat organization, the software migration will often require support from other teams and needs to coordinate their efforts closely with those other teams.  The priorities and the goals of each team is different, so, unless all the managers and upper management is fully focused on making the migration a success, the migration project is likely to fail or face significant hurdles.</p><p><em>Case study:  I have worked in a migration project where the success of the migration project required a different team to run a re-indexing process to which they initially agreed but refused to do it at the end of the project because of the cost.  This caused the migration project to almost fail and caused various career consequences for the members of the team who was in charge of the migration.</em></p><h4>TCO - total cost of ownership</h4><p>The total cost of ownership of completing a migration is notoriously difficult to measure. The technical, the product, and the business wings of any organization need to work hand in hand to achieve the common goal of that organization. Every stage of the migration process has associated cost such as time and resources and has opportunity cost. Since there are uncertainties that plague the planned work, the total cost cannot be accurately measured ahead of time.</p><p><em>Case study:  I have worked in an on-prem to cloud migration project which was delayed by 1 year because of inaccurate accounting of the total work that needed to be completed.  The project kept going for ever, the upper management could not get an approximate date of completion, and this migration delay caused several other projects to get delayed for unforeseen amount of time causing significant customer dissatisfaction and negative economic impact.</em></p><p>But, why do you, the reader would care about these nuances?  Surely, you have a team of very strong engineers and run a tight ship.  Hear me out, like the sailors sailing through uncharted seas, be afraid, be very afraid.  You may end up paying more than you have bargained for.  To be continued &#8230;</p>]]></content:encoded></item><item><title><![CDATA[Zen and The Art of Software Testing]]></title><description><![CDATA[How we test modern software]]></description><link>https://www.zerodowntime.dev/p/zen-and-the-art-of-software-testing</link><guid isPermaLink="false">https://www.zerodowntime.dev/p/zen-and-the-art-of-software-testing</guid><dc:creator><![CDATA[Nobel Khandaker]]></dc:creator><pubDate>Sat, 25 Jan 2025 23:52:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-RyS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714e3e8c-bd76-4927-aabd-0def13773a41_960x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It was a bright sunny morning when I read the unbelievable news - the Mars Rover got destroyed due to a wrong calculation.  The engineers who built the software did not convert metric to English system and tragedy ensued. This gives an idea of how important the practice of software testing is!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-RyS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714e3e8c-bd76-4927-aabd-0def13773a41_960x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-RyS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714e3e8c-bd76-4927-aabd-0def13773a41_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!-RyS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714e3e8c-bd76-4927-aabd-0def13773a41_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!-RyS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714e3e8c-bd76-4927-aabd-0def13773a41_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!-RyS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714e3e8c-bd76-4927-aabd-0def13773a41_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-RyS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714e3e8c-bd76-4927-aabd-0def13773a41_960x720.png" width="453" height="339.75" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/714e3e8c-bd76-4927-aabd-0def13773a41_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:453,&quot;bytes&quot;:539525,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-RyS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714e3e8c-bd76-4927-aabd-0def13773a41_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!-RyS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714e3e8c-bd76-4927-aabd-0def13773a41_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!-RyS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714e3e8c-bd76-4927-aabd-0def13773a41_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!-RyS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714e3e8c-bd76-4927-aabd-0def13773a41_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As the complexity of the software projects grow, so does the difficulty of testing those big software projects. With the growth of cloud technologies, the number of users who use a software now reaches hundreds of millions and even billions. When mistakes are made, the impact of that mistake now potentially be catastrophic. Here are a few software testing practices that I have found useful in assuring high QoS for the software and services.</p><h2>Wait, AI could Test Everything!</h2><p>It is true that AI agents have been making great progress in automating the process of writing and execution of software tests, it still has a long way to go.  A lot of that has to do with pure combinatorics.  Let&#8217;s assume we have have a program that has 10,000 functions, each of those functions has 5 input parameters and we use the AI agent to write unit tests that execute in 10 ms.  The total time required to write unit tests would be 8 Hrs and that is just unit tests.  Executing other types of tests would take even longer.  So, sheer number and brute force execution will be too slow to be practically used in a CI/CD pipeline.  We need engineers who can judiciously test the software without taking too much time<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;10000 \\times 5 \\times 0.01 = \\frac{500}{60} = 8 Hrs&quot;,&quot;id&quot;:&quot;PTGMWUAUHH&quot;}" data-component-name="LatexBlockToDOM"></div><p></p><h3>Types of Software Tests</h3><h4><em>Unit Test</em></h4><p>Unit testing is the first line of defense against those creepy crawlies Once I did not add enough unit tests in my code, a senior developer asked me "How do you know that your code works?"When practicing TDD (test driven development), you would write a test for your new piece of code, run the test which will fail, and then write code until that failing test passes. The unit tests are usually white-box tests - tests that know how a unit/function operates. While building a car, a unit test would be measuring and verifying the current produced by the alternator.</p><h4>Functional tests</h4><p>The functional tests takes on the dependencies of external libraries and services to test the functionality of an entire computer program or module. While building a car, functional testing would be measuring the torque while running the engine with the input of gas and battery.Tools such as <a href="https://www.sonarqube.org/">Sonarqube </a>can help you run all unit and functional tests on demand or as part of the CI/CD pipeline.</p><h4>Integration Tests</h4><p>These are the heavyweights of the testing world. These are designed to test the software end-to-end with in a real-world environment. You can think of these as driving the automobile on a road to test how well it drives. In this test all components of the car are working in unison to produce the driving outcome. Integration tests are usually the most time-consuming tests to write and are the most difficult to maintain.</p><h4>Performance Tests</h4><p>So, the car drives fine in a plane obstacle-free road, but what about when the going gets tough? Performance tests are all about testing the program as a whole under load conditions. You can think of this as driving the car in with maximum load (passengers or goods).Performance tests are usually done after completing all other testing has been done and the infrastructure and code have reached a certain level of stability. Tools such as <a href="https://jmeter.apache.org/">Jmeter</a> are useful for running performance/load tests on APIs.</p><h4>Monitoring</h4><p>In cloud computing, the major components/modules of a piece of software is stored in the cloud and is updated frequently by a moderately large group of engineers. To ensure the QoS, the critical components of a software are now monitored continuously. The monitor is usually a program that measures the health of some service or executes a set of defined steps over and over. When the health metric of the service falls below threshold or one of those steps fail, the monitor alerts the supporting engineering team. In our car example, it would be like checking the battery power and the engine temperature while the car is being driven.Finding the right balance between effective monitoring and alerting noise will take continual testing and re-adjustment of alert threshold.</p><h4><em>Optics</em></h4><p>While monitors detect instantaneous failures issues, optics are a way of looking at the performance and history of the software in the real world over a longer term. Optics for a software may provide insights into its usage (growth of user base), performance (response time during peak and off-peak hours), etc. For our car example, optics would be like collecting and analyzing gas mileage data for the car over a month to verify its actual mileage. A successful optics infrastructure should contain a small set of basic reports and the ability to export collected data.</p><h4>Testing in Production (TIP)</h4><p>Replicating the scale and complexity of the production environment in a dev or testing environment is notoriously difficult.  So, functional tests run in the dev or test environment may fail to surface the subtle bugs.   To overcome this, engineers often isolate a smaller set of users within the production environment and test the software release there by running manual, automated tests and by reviewing the monitoring and optics.</p><h2>Time-Tested Best Practices</h2><ul><li><p>Unit tests should complete in milliseconds</p></li><li><p>Code coverage should be fairly high <strong>&gt;= 95%</strong> to be effective against regressions or bugs.</p></li><li><p>Unit tests are the best tool to avoid regressions when refactoring your legacy code.</p></li><li><p>When integrated with the build pipeline, unit tests are the best <em>type of documentation</em> for your code.</p></li><li><p>Incorporating the unit tests in the CI/CD pipeline usually can be helpful in preventing regressions.  </p></li><li><p>Functional, Performance, and UI tests all have a significant cost of development and maintenance</p></li><li><p>Manual testing still remains useful for catching complex software solutions and services</p><p></p></li></ul><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Quantum computing may help with the execution time, but that is not commercially available yet</p></div></div>]]></content:encoded></item><item><title><![CDATA[The Art of Feature Flighting]]></title><description><![CDATA[Changing the wheels of a vehicle in motion]]></description><link>https://www.zerodowntime.dev/p/the-art-of-feature-flighting</link><guid isPermaLink="false">https://www.zerodowntime.dev/p/the-art-of-feature-flighting</guid><dc:creator><![CDATA[Nobel Khandaker]]></dc:creator><pubDate>Sat, 25 Jan 2025 22:40:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!W55z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f468a22-462d-4240-a9cc-9fcb154dc9c4_611x542.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Feature Flighting and the QoS</h2><p>The fast continuous integration and deployment cycles adopted by the majority of the software organizations mean greater code churn - new features are being released  and old features are being deprecated on a regular basis. Although there are usually multiple testing environments where engineers could run and test their code, they often cannot match the scale (# of users, requests, etc.) of the production environment.  As a result, the true test of the new software occurs only when it is run in the production environment.  </p><p>  So, we need a mechanism to test new code and features in the production environment without impacting the user experience.  We could first segment the users according to some criteria such as geolocation, type, etc.  Once divided, we could choose one or more of those user segments to use the new app and test all functionalities of the new feature or app in the production environment.  This mechanism is called feature flighting.  A more complete discussion on feature flighting can be found <a href="https://martinfowler.com/articles/feature-toggles.html">here</a>.</p><p>The implementation could be following.  When flighting an API, we could add logic that checks the context (location id in this example) and then routes the code to legacy or new function according to that context.</p><h4>Feature flighting Example</h4><pre><code>from fastapi import HTTPException

location_whitelist = ['x','y']

@app.get("/items/{location_id}")
def read_item(location_id: int):
    if location_id in location_whitelist:
        return new_read_item()
    else:
        return legacy_read_item()

def legacy_read_item():
    pass

def new_read_item():
    pass
</code></pre><p>The following feature tools are popular choices for flighting features in the production environment.</p><ul><li><p><a href="https://github.com/github/scientist">Github scientist</a></p></li><li><p><a href="https://getunleash.io/">Unleash</a></p></li></ul><h4>Best Practices to Avoid Incidents</h4><ul><li><p>Your feature flighting logic should use a whitelisting approach and route the code to the new feature if the context or the user is in the whitelist.</p><ul><li><p>The default/legacy code path should continue to work if the flight check fails</p></li></ul></li><li><p>Always create tasks to remove the added flighting logic when all users have been migrated to the new feature and the testing is complete</p></li></ul><h2>Use case: Food Delivery App Migration using Geospatial Feature Flighting</h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W55z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f468a22-462d-4240-a9cc-9fcb154dc9c4_611x542.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W55z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f468a22-462d-4240-a9cc-9fcb154dc9c4_611x542.png 424w, https://substackcdn.com/image/fetch/$s_!W55z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f468a22-462d-4240-a9cc-9fcb154dc9c4_611x542.png 848w, https://substackcdn.com/image/fetch/$s_!W55z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f468a22-462d-4240-a9cc-9fcb154dc9c4_611x542.png 1272w, https://substackcdn.com/image/fetch/$s_!W55z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f468a22-462d-4240-a9cc-9fcb154dc9c4_611x542.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W55z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f468a22-462d-4240-a9cc-9fcb154dc9c4_611x542.png" width="267" height="236.84779050736498" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f468a22-462d-4240-a9cc-9fcb154dc9c4_611x542.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:542,&quot;width&quot;:611,&quot;resizeWidth&quot;:267,&quot;bytes&quot;:176909,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!W55z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f468a22-462d-4240-a9cc-9fcb154dc9c4_611x542.png 424w, https://substackcdn.com/image/fetch/$s_!W55z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f468a22-462d-4240-a9cc-9fcb154dc9c4_611x542.png 848w, https://substackcdn.com/image/fetch/$s_!W55z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f468a22-462d-4240-a9cc-9fcb154dc9c4_611x542.png 1272w, https://substackcdn.com/image/fetch/$s_!W55z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f468a22-462d-4240-a9cc-9fcb154dc9c4_611x542.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>In 2022, I was leading the engineering team of a tech startup in Bangladesh.  One of the services offered by this startup was app-based food delivery services to ~ 10M customers in the city of <a href="https://maps.app.goo.gl/NragScBqGbihdmHx9">Dhaka</a>, Bangladesh.  Customers would order food through its app and the delivery drivers would get the order, pickup the food from the restaurant and deliver it to the delivery address listed on the order for a fee.  After rewriting the legacy food app, the management team of the startup wanted to migrate the users from the old app to the new app while maintaining a high QoS and avoiding service interruptions.  The issue was, the refactored architecture, code, and storage systems of this food app could not be tested for a large scale user base and had the risk of encountering service disruptions and outages.</p><h3>Problem</h3><div><hr></div><p>How can we choose a smaller set of users and delivery person to use our new version of the food delivery app so that the impact from any bugs remain isolated to that small set of users (w/o impacting the rest of the users).  As the newly designed service gets stabilized, we would like to gradually onboard all users to the new app.</p><h4>Solution: Strangler Pattern and Geographical Zones</h4><div><hr></div><p>The novel solution was to divide all users of the Dhaka city into several smaller subgroups based on their geographical locations (latitude, longitude): Uttara, Mirpur, etc.  Then we could choose one or more of those smaller zones and onboard them to the new app one zone at a time.  This gradual migration will allow us to test and stabilize the new app while avoiding any major service outages.  </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5kkw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9f7a6-c7b1-436a-bc8e-737651fe8b94_664x562.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5kkw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9f7a6-c7b1-436a-bc8e-737651fe8b94_664x562.png 424w, https://substackcdn.com/image/fetch/$s_!5kkw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9f7a6-c7b1-436a-bc8e-737651fe8b94_664x562.png 848w, https://substackcdn.com/image/fetch/$s_!5kkw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9f7a6-c7b1-436a-bc8e-737651fe8b94_664x562.png 1272w, https://substackcdn.com/image/fetch/$s_!5kkw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9f7a6-c7b1-436a-bc8e-737651fe8b94_664x562.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5kkw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9f7a6-c7b1-436a-bc8e-737651fe8b94_664x562.png" width="498" height="421.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fea9f7a6-c7b1-436a-bc8e-737651fe8b94_664x562.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:562,&quot;width&quot;:664,&quot;resizeWidth&quot;:498,&quot;bytes&quot;:55447,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5kkw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9f7a6-c7b1-436a-bc8e-737651fe8b94_664x562.png 424w, https://substackcdn.com/image/fetch/$s_!5kkw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9f7a6-c7b1-436a-bc8e-737651fe8b94_664x562.png 848w, https://substackcdn.com/image/fetch/$s_!5kkw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9f7a6-c7b1-436a-bc8e-737651fe8b94_664x562.png 1272w, https://substackcdn.com/image/fetch/$s_!5kkw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9f7a6-c7b1-436a-bc8e-737651fe8b94_664x562.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The solution steps were as follows:</p><ul><li><p>Continue running the legacy and the new app side by side in the production environment</p><ul><li><p>Required us to onboard the support and delivery personnel to ensure they can use the new and legacy backend dashboards</p></li></ul></li><li><p>Design a geolocation webservice to map a user&#8217;s geolocation to one of the geographical zones</p></li><li><p>Use feature flighting to route all network traffic from the user app (order, payment, etc.) to either the legacy app or the new app based on a user&#8217;s geographical zone</p></li><li><p>Use strangler pattern to gradually transition the geographical zones from the legacy system to the new system</p></li><li><p>When all users&#8217; zones have been migrated to the new services, sunset the legacy services</p></li></ul><h4>End Result</h4><p>Using the geographical feature flighting, our engineering team successfully migrated the users one zone at a time and stabilized the new app <em>without any</em> service outages.</p>]]></content:encoded></item></channel></rss>