Loom recently crossed the 10 million user mark. A major driver behind this growth — and one of the main reasons our users love Loom — is how instantly anyone can record and share a video message. The speed and ease are why you can use Loom multiple times a day and get your messages across in an engaging way every time.
For many of our users, editing a video message gets in the way of the end goal: sharing it. And up until now, our editing experience has been slow and unreliable.
Today, I'm very excited to share that the speed of Loom's editing performance is now on par with its creating and sharing performance: you can now edit video messages in Loom with the fastest cloud video editor in the world.
With Instant Editing, it's easy to quickly edit out a fumble here or there and send a polished loom instantly — no matter how long the video is.
We can't wait for you to try it!
This foundation also allows us to build a number of advanced editing features that will allow you to record and share with confidence. After we fine-tune the user interface for Instant Editing, we plan to explore more features such as removing filler words, intro and outro videos, voice filters, and highlight reels (to name a few) over the next year.
If you're curious about how our Instant Editing feature came to be, I get into the technical details below.
Why we made a long-term investment to build the fastest cloud-based video editor
Three years ago, we built Loom's first video editing experience. We knew it was important to support both trimming (removing the beginning or end of a video) and cutting (removing middle sections of a video) to give users the option to quickly polish their video messages before sharing them.
Most online video editors only support trimming because deleting middle sections requires modifying video frames and stitching the pieces back together. This process, known as re-encoding, is computationally expensive and comes at the cost of long loading times.
During the QA of our video trimming feature launch back in 2018, Britt, our Product Design Manager, requested making the loading experience instant. After a few conversations, the team agreed, if we were going to make video messaging at work a daily habit, all operations over a Loom (create, edit, share) must be just as fast as working with a written document.
Why we decided to redo our entire recording layer
When I started to think about how we might be able to deliver this technology, it became obvious that our encoding layer needed to be radically changed and unified around a certain container format.
At the time, we had a Chrome extension and were about to release our desktop app. Our Chrome extension recorded into a single file format, while our desktop app recorded into a playlist container format, which leverages a playlist that specifies a sequence of multiple video files that make up the whole video. I've roughly illustrated the difference below.
However, we started to recognize some critical problems with recording into a single file format:
It was difficult to repair if bad data came in.
Having to process the format on a fleet of stateful servers introduced a single point of failure.
There were scaling issues having to host the entire recording file on a given device or server.
Playlist container formats allowed us to deliver on Instant Editing by treating editing as the sum of bounded operations on each individual video part. In other words, if we had a video made up of 5-second part files, editing a single part on a single machine was guaranteed to be fast to download, re-encode (with some magic), and then upload as opposed to having to do the same thing with the entire video file. Moreover, we could parallelize the editing workload across multiple machines.
Product and engineering leaders will sometimes be forced to choose one of two options: do the hard thing and scrap the core building blocks of your system for a better future, or try to build around the limitations of the legacy system. In moments like this, I am reminded of Alan Kay's talk on simplicity. He highlights the necessity of companies scrapping foundational building blocks in pursuit of overall simplicity, resulting in greater performance and faster innovation.
The decision was clear but difficult to make because of how expensive it was in build time. We had to redo our years-old, battle-tested recording layer that powered our Chrome extension. We also had to ensure every new recording layer conformed to a playlist container format, something that is non-standard since most native recording APIs have been built with the single-file use case in mind. On top of that, I knew we were going to be building against Mac, Windows, iOS, and Android in the years to come. The entire journey could be multiple technical writeups, so I will condense it below to be succinct.
How we built a real-time video transformation layer
Once all of our recording platforms were encoding into a playlist container format, we were able to build an instant editing layer. We knew, as with any product, we first had to lay out the requirements:
The operations of trimming and cutting needed to be instant.
The operation of cutting required slicing — generating the bordering video frames, and stitching — combining previously disjointed sequences of frames. And all of it had to be instant.
The system had to be just-in-time. That is, it had to be truly instant and couldn't incur upfront computation or storage. No editing the video in the background before save. No cheating.
The system should be generalizable to other editing capabilities we'd want in the future (highlight reels, voice filters, lighting adjustment, etc.).
With these product requirements in mind, Bruno, a core video engineer, started building a proof of concept. His proof of concept went from editing times being around 20 seconds to real-time (instant) over a quarter (per 5-second 4K video part). We struggled with difficult decisions regarding the system architecture but landed on something that is exceptionally fast.
The overarching problem we had to solve was spending the least amount of time possible decoding or encoding video frames, which are both expensive operations. There were many problems we needed to tackle to get to a solution that was fast (re-encoding only necessary GOPs, maximizing multi-threaded encoding, copying unmodified bytes, shifting timestamps on the client, etc.). We'll save the exact details for another more technical blog post.
Now that we can edit on top of video just-in-time, we have 75% of our core video team dedicated to migrating the transformation of all our videos to being just-in-time (we'll have another blog post on that coming soon!).
Re-building foundations can pay off
It's not always the right decision to rebuild or rewrite because of the immense cost incurred in time, focus, and money. Our engineering team strongly leans towards practicality when it comes to building complex systems because we know it takes time and effort to operate and scale those systems. In this case, we took a chance, and it paid off. As a result of the efforts of our team, we have significantly better recording and editing speed and stability, which pushes us a step closer towards empowering effective communication, wherever work happens.
What Instant Editing will allow us to build for our users in the future is exciting. We are entering a world where helping our users get their point across (e.g., highlight reels, intro and outro videos) and sound more confident (e.g., voice filters, removing filler words) will be simple — and instant.
As a final note, if you are interested in working with us on novel product problems such as Instant Editing, my DMs on Twitter are open and we check every application on our job board.