Skip to main content

· 11 min read
Tanveer Gill
Sudhanshu Prajapati

Service Mesh technologies such as Istio have brought great advancements to the Cloud Native industry with load management and security features. However, despite all these advancements, Applications running on Service Mesh technologies are unable to prioritize critical traffic during peak load times. Therefore, we still have real-world problems such as back-end services getting overloaded because of excess load, leading to a degraded user experience. While it is close to impossible to predict when a sudden increase in load will happen, it is possible to bring a crucial and essential feature in Service Meshes to handle unexpected scenarios: Observability-driven load management.

In this blog post, we will discuss the benefits of Istio and how FluxNinja Aperture can elevate Istio to be a robust service mesh technology that can handle any unexpected scenario by leveraging advanced load management capabilities.

· 6 min read
Hasit Mistry

Web services must be equipped with the ability to anticipate and manage unpredictable traffic surges, a crucial requirement for businesses with a growing online presence. Failure to do so can lead to a degraded user experience, resulting in potential revenue loss over time. Users expect seamless and reliable service, and any disruptions or downtime can severely impact a business's reputation.

By taking a proactive approach to managing unpredictable traffic surges, web services can ensure the ongoing satisfaction of their users and the long-term success of their business. It is crucial for businesses to prioritize user journeys and invest in solutions that can help navigate these challenges.

· 12 min read
Gur Singh
Suman Kumar
Nato Boram
info

This is a guest post by CodeRabbit, a startup that uses OpenAI's API to provide AI-driven code reviews for GitHub and GitLab repositories.

Since CodeRabbit launched a couple of months ago, it has received an enthusiastic response and hundreds of sign-ups. CodeRabbit has been installed in over 1300 GitHub organizations and typically reviews more than 2000 pull requests per day. Furthermore, the usage continues to grow at a rapid pace, we are experiencing a healthy week-over-week growth.

While this rapid growth is encouraging, we've encountered challenges with OpenAI's stringent rate limits, particularly for the newer gpt-4 model that powers CodeRabbit. In this blog post, we will delve into the details of OpenAI rate limits and explain how we leveraged the FluxNinja's Aperture load management platform to ensure a reliable experience as we continue to grow our user base.

· 9 min read
Jai Desai
Sudhanshu Prajapati

At a Glance:

  • The blog aims to demystify the process of coupling HashiCorp Consul, a widely adopted service mesh, with FluxNinja Aperture, a platform specializing in observability-driven load management.
  • HashiCorp Consul and FluxNinja Aperture's technical teams collaborated to enable seamless integration. This is facilitated through the Consul’s Envoy extension system, leveraging features like external authorization and OpenTelemetry Access logging.
  • By integrating these two platforms, the service reliability and performance of networked applications can be significantly improved. The synergy offers adaptive rate-limiting, workload prioritization, global quota management, and real-time monitoring, turning a previously manual siloed process of traffic adjustments into an automated, real-time operation.

In the dynamic world of software, modern applications are undergoing constant transformation, breaking down into smaller, nimble units. This metamorphosis accelerates both development and deployment, a boon for businesses eager to innovate. Yet, this evolution isn't without its challenges. Think of hurdles like service discovery, ensuring secure inter-service communication, achieving clear observability, and the intricacies of network automation. That's where HashiCorp Consul comes into play.

· 11 min read
Harjot Gill
Tanveer Gill

Over the last decade, significant investments have been made in the large-scale observability systems, accompanied by the widespread adoption of the discipline of Site Reliability Engineering (SRE). Yet, an over-reliance on observability alone has led us to a plateau, where we are witnessing diminishing returns in terms of overall reliability posture. This is evidenced by the persistent and prolonged app failures even at well-resourced companies that follow the best practices for observability.

Furthermore, the quest for reliability is forcing companies to spend ever more on observability, rivaling the costs of running the services they aim to monitor. Commercial SaaS solutions are even more expensive, as the unpredictable pricing models can quickly skyrocket the observability bill. The toll isn't only monetary; it extends to the burden shouldered by developers implementing observability and operators tasked with maintaining a scalable observability stack.

· 10 min read
Sudhanshu Prajapati

In modern engineering organizations, service owners don't just build services, but are also responsible for their uptime and performance. Ideally, each new feature is thoroughly tested in development and staging environments before going live. Load tests designed to simulate users and traffic patterns are performed to baseline the capacity of the stack. For significant events like product launches, demand is forecasted, and resources are allocated to handle it. However, the real world is unpredictable. Despite the best-laid plans, below is a brief glimpse of what could still go wrong (and often does):

  • Traffic surges: Virality (Slashdot effect) or sales promotions can trigger sudden and intense traffic spikes, overloading the infrastructure.
  • Heavy-hitters and scrapers: Some outlier users can hog up a significant portion of a service's capacity, starving regular user requests.
  • Unexpected API usage: APIs can occasionally be utilized in ways that weren't initially anticipated. Such unexpected usage can uncover bugs in the end-client code. Additionally, it can expose the system to vulnerabilities, such as application-level DDoS attacks.
  • Expensive queries: Certain queries can be resource-intensive due to their complexity or lack of optimization. These expensive queries can lead to unexpected edge cases that degrade system performance. Additionally, these queries could push the system to its vertical scaling boundaries.
  • Infrastructure changes: Routine updates, especially to databases, can sometimes lead to unexpected outcomes, like a reduction in database capacity, creating bottlenecks.
  • External API quotas: A backend service might rely on external APIs or third-party services, which might impose usage quotas. End users get impacted when these quotas are exceeded.

· 11 min read
Tanveer Gill

Imagine a bustling highway system, a complex network of roads, bridges, tunnels, and intersections, each designed to handle a certain amount of traffic. Now, consider the events that lead to traffic jams - accidents, road work, or a sudden influx of vehicles. These incidents cause traffic to back up, and often, a jam in one part of the highway triggers a jam in another. A bottleneck on a bridge, for example, can lead to a jam on the road leading up to it. Congestion creates many complications, from delays and increased travel times, to drivers getting annoyed over wasted time and too much fuel burned. These disruptions don’t just hurt the drivers, they hit the whole economy. Goods are delayed and services are disrupted as employees arrive late (and angry) at work.

But highway systems are not left to the mercy of these incidents. Over the years, they have evolved to incorporate a multitude of strategies to handle such failures and unexpected events. Emergency lanes, traffic lights, and highway police are all part of the larger traffic management system. When congestion occurs, traffic may be re-routed to alternate routes. During peak hours, on-ramps are metered to control the influx of vehicles. If an accident occurs, the affected lanes are closed, and traffic is diverted to other lanes. Despite their complexities and occasional hiccups, these strategies aim to manage traffic as effectively as possible.

· 10 min read
Sudhanshu Prajapati

We’ve been hearing about rate limiting quite a lot these days, being implemented throughout popular services like Twitter and Reddit. Companies are finding it more and more important to control the abuse of services and keep costs under control.

Before I started working as a developer advocate, I built quite a few things, including integrations and services that catered to specific business needs. One thing that was common while building integrations was the need to be aware of rate limits when making calls to third-party services. It’s worth making sure my integration doesn't abuse the third-party service API. On the other hand, third-party services also implement their own rate-limiting rules at the edge to prevent being overwhelmed. But how does all this actually work? How do we set this up? What are the benefits of rate limiting? We’ll cover these topics, and then move on to the reasons why adaptive rate limiting is necessary.

· 8 min read
Marta Rogala

Have you ever tried to buy a ticket online for a concert and had to wait or refresh the page every three seconds when an unexpected error appeared? Or have you ever tried to purchase something during Black Friday and experienced moments of anxiety because the loader just kept on… well… loading, and nothing appeared? We all know it, and we all get frustrated when errors occur and we don't know what's wrong with the website and why we can't buy the ticket we want.

· 18 min read
Sudhanshu Prajapati

Even thirty years since its inception, PostgreSQL continues to gain traction, thriving in an environment of rapidly evolving open source projects. While some technologies appear and vanish swiftly, others, like the PostgreSQL database, prove their longevity, illustrating that they can withstand the test of time. It has become the preferred choice by many organizations for data storage, from general data storage to an asteroid tracking database. Companies are running PostgreSQL clusters with petabytes of data.