10 posts tagged with "aperture"

FluxNinja Aperture v1.0 is here - A managed rate-limiting service, batteries included

January 29, 2024 · 6 min read

Developer Relations

The FluxNinja team is excited to launch “rate-limiting as a service” for developers. This is a start of a new category of essential developer tools to serve the needs of the AI-first world, which relies heavily on effective and fair usage of programmable web resources.

info

Try out FluxNinja Aperture for rate limiting. Join our community on Discord, appreciate your feedback.

FluxNinja is leading this new category of “managed rate-limiting service” with the first of its kind, reliable, and battle-tested product. After its first release in 2022, FluxNinja has gone through multiple iterations based on the feedback from the open source community and paid customers. We are excited to bring the stable version 1.0 of the service to the public.

Balancing Cost and Efficiency in Mistral with Concurrency Scheduling

January 8, 2024 · 11 min read

Karanbir Sohi

Software Engineer

In the fast-evolving space of generative AI, OpenAI's models are the go-to choice for most companies for building AI-driven applications. But that may change soon as open-source models catch up by offering much better economics and data privacy through self-hosted models. One of the notable competitors in this sector is Mistral AI, a French startup, known for its innovative and lightweight models, such as the open-source Mistral 7B. Mistral has gained attention in the industry, particularly because their model is free to use and can be self-hosted. However, generative AI workloads are computationally expensive, and due to the limited supply of Graphics Processing Units (GPUs), scaling them up quickly is a complex challenge. Given the insatiable hunger for LLM APIs within organizations, there is a potential imbalance between demand and supply. One possible solution is to prioritize access to LLM APIs based on request criticality while ensuring fair access among users during peak usage. At the same time, it is important to ensure that the provisioned GPU infrastructure gets maximum utilization.

In this blog post, we will discuss how FluxNinja Aperture's Concurrency Scheduling and Request Prioritization features significantly reduce latency and ensure fairness, at no added cost, when executing generative AI workloads using the Mistral 7B Model. By improving performance and user experience, this integration is a game-changer for developers focusing on building cutting-edge AI applications.

Building cost-effective Generative AI applications

December 19, 2023 · 8 min read

Gur Singh

Co-Founder - CodeRabbit

Nato Boram

Software Engineer - CodeRabbit

info

This guest post is by CodeRabbit, a startup utilizing OpenAI's API to provide AI-driven code reviews for GitHub and GitLab repositories.

Since its inception, CodeRabbit has experienced steady growth in its user base, comprising developers and organizations. Installed on thousands of repositories, CodeRabbit reviews several thousand pull requests (PRs) daily. We have previously discussed our use of an innovative client-side request prioritization technique to navigate OpenAI rate limits. In this blog post, we will explore how we manage to deliver continuous, in-depth code analysis cost-effectively, while also providing a robust, free plan to open source projects.

Squeezing Water from Stone: Managing OpenAI Rate Limits with Request Prioritization

October 23, 2023 · 12 min read

Gur Singh

Co-Founder - CodeRabbit

Suman Kumar

Software Engineer - CodeRabbit

Nato Boram

Software Engineer - CodeRabbit

info

This is a guest post by CodeRabbit, a startup that uses OpenAI's API to provide AI-driven code reviews for GitHub and GitLab repositories.

Since CodeRabbit launched a couple of months ago, it has received an enthusiastic response and hundreds of sign-ups. CodeRabbit has been installed in over 1300 GitHub organizations and typically reviews more than 2000 pull requests per day. Furthermore, the usage continues to grow at a rapid pace, we are experiencing a healthy week-over-week growth.

While this rapid growth is encouraging, we've encountered challenges with OpenAI's stringent rate limits, particularly for the newer gpt-4 model that powers CodeRabbit. In this blog post, we will delve into the details of OpenAI rate limits and explain how we leveraged the FluxNinja's Aperture load management platform to ensure a reliable experience as we continue to grow our user base.

Protecting PostgreSQL with Adaptive Rate Limiting

July 5, 2023 · 18 min read

Sudhanshu Prajapati

Developer Advocate

Even thirty years since its inception, PostgreSQL continues to gain traction, thriving in an environment of rapidly evolving open source projects. While some technologies appear and vanish swiftly, others, like the PostgreSQL database, prove their longevity, illustrating that they can withstand the test of time. It has become the preferred choice by many organizations for data storage, from general data storage to an asteroid tracking database. Companies are running PostgreSQL clusters with petabytes of data.

Announcing Aperture GA: Open-Source Tool for Intelligent Load Management

April 18, 2023 · 3 min read

Sudhanshu Prajapati

Developer Advocate

Karanbir Sohi

Software Engineer

San Francisco — FluxNinja is thrilled to announce the General Availability of its innovative open source tool, Aperture. This cutting-edge solution is designed to enable prioritized load shedding driven by observability and graceful degradation of non-critical services, effectively preventing total system collapse. Furthermore, Aperture intelligently auto-scales essential resources only when necessary, resulting in significant infrastructure cost savings.

Integrating FluxNinja Aperture with Nginx for Effective Load Management

April 5, 2023 · 13 min read

Sudhanshu Prajapati

Developer Advocate

In today's world, the internet is the most widely used technology. Everyone, from individuals to products, seeks to establish a strong online presence. This has led to a significant increase in users accessing various online services, resulting in a surge of traffic to websites and web applications.

Because of this surge in user traffic, companies now prioritize estimating the number of potential users when launching new products or websites, due to capacity constraints which lead to website downtime, for example, after the announcement of ChatGPT 3.5, there was a massive influx of traffic and interest from people all around the world. In such situations, it is essential to have load management in place to avoid possible business loss.

FluxNinja Aperture at Chaos Carnival 2023

March 28, 2023 · 7 min read

Sudhanshu Prajapati

Developer Advocate

Graceful degradation and managing failures in complex microservices are critical topics in modern application architecture. Failures are inevitable and can cause chaos and disruption. However, prioritized load shedding can help preserve critical user experiences and keep services healthy and responsive. This approach can prevent cascading failures and allow for critical services to remain functional, even when resources are scarce.

To help navigate this complex topic, Tanveer Gill, the CTO of FluxNinja, got the opportunity to present at Chaos Carnival 2023 (March 15-16), which happened virtually, the sessions were pre-recorded. Though, attendees could interact with speakers since they were present all the time during the session.

Load Management with Istio using FluxNinja Aperture

March 16, 2023 · 15 min read

Sudhanshu Prajapati

Developer Advocate

Cover Image

Service meshes are becoming increasingly popular in cloud-native applications, as they provide a way to manage network traffic between microservices. Istio, one of the most popular service meshes, uses Envoy as its data plane. However, to maintain the stability and reliability of modern web-scale applications, organizations need more advanced load management capabilities. This is where Aperture comes in, offering several features, including:

Graceful Degradation with Aperture: A Key to Better User Experience

March 7, 2023 · 19 min read

Sudhanshu Prajapati

Developer Advocate

Graceful Degradation

In today's world of rapidly evolving technology, it is more important than ever for businesses to have systems that are reliable, scalable, and capable of handling increasing levels of traffic and demand. Sometimes, even the most well-designed microservices systems can experience failures or outages. There are several examples in the past where companies like Uber, Amazon, Netflix, and Zalando faced massive traffic surges and outages. In the case of Zalando (Shoes & Fashion Company), the whole cluster went down; one of the attributes was high latency, causing critical payment methods to stop working and impacting both parties, customers, and the company. This outage caused them a monetary loss. Later on, companies started adopting the graceful degradation paradigm.

Thanks for signing up!

Sign up for updates!