Wordblazer

A transcription editor by a transcriptionist, for a transcriptionist.

Features

  • Local storage
  • Autosave
  • Keyboard shortcuts
  • Text expansion
  • WebHID pedal support

Tech Stack

  • SvelteKit

Overview

The transcription software interface with an interview transcript open.
Main interface of the transcription editor.

I spent a few years doing freelance audio-to-text transcription, so I say from experience that when transcribing audio, an efficient workflow is key. There were certain features that I knew I wanted in a transcription editor, but I couldn't find them in any of the programs I had access to. As an aspiring front-end developer, I thought: why not go ahead and implement my ideal editor myself?

Local data storage

{
  id: "cc311b8a-af76-4e87-a5ff-30b50a9e9cdb",
  name: "Jay on Media Representation and Resistance",
  date: "2022-05-25T19:32:03.387Z",
  notes: "from https://www.youtube.com/watch?v=9h9De73ONVA",

  audio: {
    key: "453b877c-5070-49db-a6f8-b199187fabc7",
    name: "Trans Oral History Project - Jay on Media Representation and Resistance.mp3",
    currentTime: 21.016093
  },
    
  speakers: [    
    { id: "2bffb251-d35d-47da-8edc-58de9fac65a5", name: "Jay" }
  ],
  
  groups: [
    {
      id: "03cc5d85-2f47-48de-85e9-3a5b6c40d213",
      speakerId: "2bffb251-d35d-47da-8edc-58de9fac65a5",
      paragraphs: [
        {
          id: "c97c8378-17d2-4a1b-9631-fa23136e8b4f",
          timestamp: 0,
          text: "We were bringing as best we could this awareness to the Sunday programs, to every other type of community building activity that we had."
        }
      ]
    }
  ]
}
Sample document structure. Audio is stored as a blob in localstorage. Groups contain a reference to the speaker and an array of paragraphs. Paragraphs contain text and timestamps for when they begin.

Since I planned on making this editor for myself rather than as any kind of software-as-a-service, there was no point in worrying over server-side storage and all the complications that entails. Instead, I opted for local data storage from the get-go.

The localForage wrapper around browser storage made implementation of locally-stored documents a breeze. Each document is stored as a structured JSON object with speakers and paragraphs with a pointer to the associated audio, which is stored as a blob. I also implemented a debounced autosave feature which saves the document after every few keystrokes — and I am happy to say that I never once lost my work.

Stub expansion

Demonstration of stub expansion. "yr" is replaced with "your", and "bsns" is replaced with "business".

Thanks to many late nights as a teenager chatting on AIM, I have a pretty high WPM — but it’s not so high that I can always keep up with the WPM of someone speaking. An essential feature for any efficient transcription editor is the ability to use expansions to minimize the number of keystrokes it takes to input a word or phrase.

I implemented expansions with a simple JavaScript object where each key corresponded to a stub (“bsns”) and the value, its expansion (“business”). An input event handler detects whether the last input character was a word delimiter, and if so, it checks if the previous “word” is a stub and replaces it with the expansion.

Though the concept is simple, some logic is necessary around making sure the text cursor is kept in the correct position, since the length of the string in our input box can change once the replacement is done. And it’s not enough to move the cursor to the end of the string, since the user could be editing text in the middle. Thankfully, the browser is able to handle the replacement lookup and cursor math with ease, so the slowest part of the system is (and always should be) the typist’s fingers.

Automatic input resizing

Demonstration of input resizing when text is added and removed.

How is the input in Wordblazer implemented? I chose to forgo rich text editing, which meant the humble <textarea> should be enough to do the job. However, by default, the size of a <textarea> is fixed, and a content overflow results in an internal scrollbar. Every paragraph of dialog is a <textarea> in Wordblazer, and I needed all of these to resize dynamically with their content.

As it turns out, auto-resizing a <textarea> efficiently is a nontrivial problem. The simplest solution is to set the height of the <textarea> equal to its scrollHeight. However, even just reading scrollHeight forces a layout reflow — and if we have to check the size after every input keystroke, this is a performance bottleneck we can’t afford!

After much experimentation, I settled on a solution in which the <Paragraph> component consisted of a relatively-positioned <div> containing both a <textarea> and an absolutely-positioned “content clone” <div> with identical styling. I attached the content clones to a global ResizeObserver, which detects when a clone resizes and dynamically adjusts the size of its <textarea> sibling to match.

Though the content duplication felt a touch “hacky”, I found the ResizeObserver approach quite performant. The content clones, being absolutely positioned, would not trigger reflows themselves. But when one did resize, we could use its bounding rectangle to set the correct height for our input area, thus minimizing the number of reflows necessary.

Pedal support with WebHID

As of early 2022, the WebHID API is currently only supported in Chrome-ish browsers. This API allows developers to interact with devices with the low-level Human Interface Device protocol, bypassing the need for specific device drivers at the OS or user agent level. I was pretty psyched to discover the WebHID API, because it meant that I could hook my transcription pedal (a “niche device” that lost hardware support after Windows 7) up to Wordblazer for hands-free audio control!

I created a very simple <FootPedal> component that displayed a button to connect to the pedal only if the browser supported WebHID, and attached a handler to inputreport events that would play and pause the audio according to whether the center pedal was pressed down or not. With a mere 47 lines of code, my workflow became that much more buttery smooth (buttery smoother??). I look forward to seeing what other possibilities will arise for WebHID as support for the API grows.

Takeaways

There are a number of other features I implemented that I didn’t go into here. The most complex thing that I did not cover is the input and navigation logic for the document itself, which handles moving between speakers and paragraphs using Enter and  in a way that is fully transparent to the transcriptionist. There are also keyboard shortcuts for playing, pausing, and adjusting the audio speed, creating and switching speakers, and adding and searching stub expansions. Additionally, there are convenience autocorrect functions, like autocapitalization of sentences and space/punctuation swapping (for when I accidentally put the space before a period or comma).

This was a large-scale project that I personally dogfooded over several months, adding enhancements as I went. My first prototypes were in Vue, which I rewrote from scratch at least twice before moving to the current Svelte implementation.

I learned about reflows and the risk of layout thrashing, and being mindful of performance where it really matters. I also made heavy use of custom events in my Svelte components, and implemented a custom store for my document objects so they would be updated and saved efficiently.

The most glaring omission requested feature (by me) is a robust undo/redo stack. Although I’ve been doing less transcription these days, this would be an interesting project to pick up sometime, and it would make the editor more useful and user-friendly to others. I imagine this could be implemented via extending the custom document store with state management. Naively, this could mean keeping N full copies of the document in memory, and an “undo” command restoring a previous copy; a more elegant version might involve tracking changes with invertible patches, perhaps with the help of a library like Immer.