Final month, I wrote an open letter to you acknowledging the frequency of outages Frame.io had been experiencing and outlined the steps we had been taking to enhance our efficiency and stability.
In an effort to stay clear, I wished to replace you on our progress—what we’ve achieved, and what we’re tackling subsequent. As I stated in that first letter, we all know how a lot you depend on Frame.io to do your work, and the way unacceptable these interruptions have been.
We’re grateful in your persistence and inspired that our efforts are paying off, in that we’ve seen a major enchancment within the stability of our platform over the previous 30 days. That is the results of quite a few essential steps we’ve taken.
What we’ve achieved
- Assigned extra personnel devoted to efficiency initiatives
- Created a devoted job power to API efficiency
- Elevated our observability and energetic displays in techniques the place we’ve seen reliability challenges
- Partnered with subject material specialists and AWS to tune our database configuration and vacuum insurance policies
- Launched a considerable replace to Storage Accounting, considerably decreasing our database load
- Efficiency tuned our high API scorching paths, in some circumstances bettering efficiency as much as 50 % and leading to higher end-user expertise
- Accomplished migration of 4 main subsystems onto our new job system
- Remoted our legacy occasion bus right into a separate Kubernetes cluster, permitting extra capability for API requests from exterior customers
Subsequent up
We proceed to intently look at any and all slowdowns in our service—even those who happen briefly—with a view to make sure that our efficiency is at all times what you count on by way of velocity and reliability.
Our subsequent steps embody:
- Give attention to API efficiency and effectivity of bigger batch operations
- Database connection pooling, multiplexing, and caching configurations
- Start work on the infrastructure that powers our new Exercise and Notifications techniques
- Proceed to maneuver async operations to our new job system
- Architect our multi-region method for storage and information
As soon as once more, we sincerely apologize for the inconvenience you’ve skilled, and wish to reinforce how critical we’re in regards to the stability of our product. I deeply admire your ongoing persistence as we proceed to deal with your considerations and suggestions.