Nov 11, 2019

Behind the Scenes: Building Loom for Desktop

Paulius Dragunas

Senior Engineering Manager

A year and a half ago, Loom had a single screen recorder client and we were beginning to feel the limitations of only providing browser-based extension. Users could quickly record video messages and get instantly shareable links to send to anyone, on any platform, but there were caveats.

We started building Loom’s second recording client with the intent of bringing the overarching benefits of a desktop-based client and unlocking ourselves from these. ‍

The Limitations of a Browser-Based Client

A user could record their screen and camera at the same time but as soon as they navigated to another desktop application, the camera bubble would be left behind in the browser. This was a problem, since the ability to display ideas with emotional context is a big part of what makes Loom special, and we knew we needed to provide an uncompromising experience here. The same concept applied to click highlighting and quick-annotation drawing tools, two of our most-requested features.

The other big problem was resolution. A user could record high fidelity content but at higher screen resolutions, the result would be blurry. Increasing the recording resolution with in-browser technology past 720p on Mac and 1080p on Windows would spike the CPU and result in laggy videos, so this wasn’t an option. ‍

Where to start? 🤔

At the time, our engineering team was six people and we needed to decide the best approach to take. Because of a fairly even user base split between Windows and Mac, and with the team’s general passion for Javascript, we decided to stand on the shoulders of web giants that came before us and chose Electron as the tool for the job. This enabled each operating system to share the UI, while calling its own native recording layers to unlock the 4k recording ability.

To build an experience that continued to get out of the users’ way and let them record, our windowing system was not going to follow the standard practices of operating system windows. As you can see below, we open up 4 windows as the result of the user clicking our menu bar icon on Mac.

Building a Scalable Windowing System

This challenge presented us with two options – put everything inside of one giant, transparent window, or separate each component into its own independent window. Prioritizing flexibility and control over how each component behaved within the operating system, we decided it would be best to treat each component as its own window. Having operating system level control of each one of these windows meant we could place certain windows above other experience, while having others below, even if they’re all open at the same time.

There was one other problem that we noticed quickly. The number of windows we had and the fact most window actions weren’t direct was starting to result in a maintenance nightmare. You can close the camera bubble by pressing the X, but it also closes in 5 other cases throughout our various recording flows. In total there are 25 actions that result in the camera bubble being closed, opened, re-positioned, or brought in front of other windows, most of which have special conditions. Below is a simplified visual of a user starting a custom size recording:

Three interactions result in fourteen window updates. Because of the nature of the app, we decided to introduce a centralized middleware with one assumption: a window can only be modified as a side effect of another action, never directly.

A user clicks the Custom Size button. We will dispatch a custom-size action and based on that one action, all windows will react with their appropriate side effects, be it changing size, position, or what other windows they sit on top of. Compare this to telling the Preferences window to hide, the Custom Size Selector to show, the Control Menu to reposition all within the context of the custom size function. This meant we now had maintainable code locality, if a window state needs to be changed, we know where to look, and overall we write less code.

Now we had a way to surface our UI out of the browser and a way to orchestrate it. We made it maintainable, and most importantly, extensible without the caveats of being in a browser window. All that was left was to record the whole thing in 4k, easy right? 🙃

Designing a Video Infrastructure to Support 4k

Our recording offering, at the time, was capable of 720p recordings and was based off a video infrastructure where more recordings meant more recording servers, transcoding servers, and larger storage costs. With 4k recordings, file sizes could increase as much as 15x from our lowest 720p bitrate, which meant more recording and transcoding servers were going to be necessary on top of increased storage costs.

The functionality of these recording servers was to take the raw video data the browser was producing, transmux it, and make the resulting video file streamable. Making a video streamable means moving some metadata to the front of the video file so the browser can then seek to different parts of the video without having to download the entire file first. Because this process has low CPU and memory overhead, we decided it would best to move it to the client’s machine. This looked something like this:

Flow diagram for Chrome extension and desktop

Less moving parts and one set of servers less to pay for. The advantages of this new, simpler system go beyond overall stability and unit economics improvements.

Because we now had access to native machine resources, we could choose what format we wanted to output. We have bundled a tiny part of FFmpeg, a video processing software suite, with our app so we are able to mux the .mp4 files that the operating systems’ APIs produce while recording the screen, and convert them to a format called HLS. At a high level, the HLS streaming format is a playlist of files rather than one large file.

If something goes wrong during any part of the recording process, the entirety of the video is not lost anymore. Only the affected part is lost while the rest of the video can still be recovered just by modifying the playlist file to not include the bad part. We have had customers lose 3 hour recordings because of an error that happened towards the end of the recording. While we have remedied the issues that we found from those reports, those users never got that time back. Loom is a time saving tool and we knew we had to implement a better way to account for the unknown unknowns.

Lastly, HLS is a much more widely supported format than .webm. On the Chrome Extension, a user could share a video link right away, but the video could not be viewed on Safari, any mobile or older browsers until we reprocessed the entire video file for compatibility. The longer the video, the longer this process took. HLS allows users to share their content immediately across all modern browsers. ‍

Come Join Us!

If you’d like to learn more about the lessons we learned while shipping Loom for desktop and the entire Pro suite of features in Vinay’s post.

The experience of building this product and the results that followed made us realize that we must continue to invest in uncompromising immediacy. With Loom for desktop, this means decreasing our app payload size, shared login sessions to decrease onboarding friction, seamless update and rollback systems, as well as a suite of video-centric features that all have speed built into their core.

Uncompromising immediacy along with element from the features mentioned above live across all of our product, if this is something you’d be interested in building, check out our open roles. If you want to discuss any of these topics further, don’t hesitate to shoot me an email or a DM on Twitter.

Paulius Dragunas

Senior Engineering Manager