Charlene Chambliss
2024 Feb 21
(NOTE: This post was originally published on Aquariumâs Tidepool blog. Aquarium has since been acquired by Notion.)
Why you should care about your appâs UX
LLM-powered applications, particularly those with chat interfaces, appear to have significant problems with user churn. For a multitude of reasons, including not understanding how to use the app, getting bad or unreliable results, or simply finding using the app too tedious to use, users often abandon these tools nearly as quickly as they came. Knowledge workers are tired of juggling multiple apps, and are looking for ways to pare down their tool stack to the bare minimum needed to get their work done.
In addition, the influx of cash into generative AI applications has created a heated market with significant competition in every vertical. Chances are, the problem your app is solving is also being tackled by multiple other companies, and everyone is in the early stages right now, meaning there is often rough feature parity between different apps solving the same problem.
On a personal level, Iâve tried dozens of LLM tools and apps over the last year or so, and only a handful have stuck with me and made it into my daily workflow. Those tools and apps all have standout UX that simply makes them easier and more pleasant to use than competing tools. Perhaps most importantly, the tools were nearly as easy to get started with and use as it was for me to stick to my existing workflow â there has been a definite payoff in productivity after an initial modest learning curve.
Not every pattern here will be applicable for every use case, but many of these apply across different environments as well as across different problems to be solved. Chances are, however, that your most engaged users have already been asking (directly or indirectly) for some of these features and conveniences. Pick and choose individual strategies to implement from these patterns, or use them as inspiration for a feature that may fit your tool even better, and over time youâll have a darned good app.
Weâll be taking cues from some of the most successful AI-native and AI-enhanced tools currently on the market: Raycast, Arc Browser, Perplexity, Notion, Midjourney, GitHub Copilot, Sourcegraph Cody, Gmail, and our very own Tidepool.
Without further ado, here are eight patterns that make for great generative AI applications.
Have a previously-static box do double duty
Examples: Arc, Raycast AI
This nifty pattern allows users to continue using their existing workflows, with the option of exploring new functionality when desired, or when the âoldâ functionality doesnât return any results.
Arcâs search box opened with familiar CMD+F doubles as both a search box and an integrated chat-with-this-webpage interface.
Arcâs CMD+F âfindâ box
In addition to being perfectly usable for search, the box accommodates non-questions, such as a request to summarize this Wikipedia page:
Arcâs summary of DJ horsegiirLâs Wikipedia page
And when answering a question, the model has clearly been set up to provide a TL;DR for the answer, in line with the expectations of a user who originally just wanted to âFindâ something:
Arc answering a question about the page
Arc also integrates the generated answer directly back into the browsing experience; clicking âFind on Pageâ will scroll to and highlight the quoted text.
Arc highlighting the quote provided in the previous image after clicking âFind in Pageâ
Similarly, Raycast has the ability to invoke Ask AI on any text you type into the app/command search field, providing convenient one-button access to answers, while also gently introducing users to a feature they may not have tried yet.
Raycastâs search bar doubling as an Ask AI entrypoint (Ask AI itself provides an entrypoint into a multi-turn AI Chat later, if desired)
Turn common intents (prompt types) into commands
Examples: Sourcegraph Cody and GitHub Copilot, Notion
This pattern is great for users because it allows them to take a workflow that normally requires a lot of typing/setup and turn it into an invokable command, which can then be bound to a hotkey. This also helps to concretize âwhat you can doâ with the tool, for users who arenât used to natural-language interfaces and donât yet have their own ideas.
Sourcegraph Codyâs list of default commands (and one of my custom commands)
Bonus points: also group commands by type, making it easier to skim the list to find what the user is looking for.
A partial list of Notionâs AI commands, grouped by intent (write, generate, edit)
Turn common prompt elements into knobs and settings
Examples: Midjourney, âAI Profilesâ or system instructions as seen in ChatGPT, Perplexity, Raycast AI
This is a pattern commonly seen in image generation interfaces. When a text input box is the primary means of specifying what you want, repeatedly specifying certain settings and preferences can get tedious.
While Midjourney is currently working on an alpha of a fully-featured website interface for generating images with their model, they also offer similar functionality with the Discord bot. /prefer suffix
can be used to attach a suffix to all generations performed after changing it, which allows easily reusing a particular phrase or a set of parameters (e.g. --ar 3:2 --chaos 20 --style raw --seed 6
).
Some of these commonly tweaked options can also be changed via the /settings
command, which provides convenient buttons.
Midjourney Botâs /settings module
This helps the user do less typing and copying and pasting. Itâs even more important to have something like this for LLM apps on mobile platforms, where typing is several times slower for most people.
Another example is Raycast AI, which allows you to set system instructions on a chat-by-chat basis instead of only globally for your entire account. System instructions are typically used to set the subject or tone of a conversation, or to enforce rules about how the model should respond (âbe concise,â ârespond in Portuguese only,â etc.).
Raycastâs UI for changing conversation-level settings such as model, temperature, or system instructions
Start with suggestions
Examples: ChatGPT (and GPTs on the GPT Store), Perplexity, Tidepool, many others
âBlinking cursor syndromeâ affects us all â when seemingly anything is possible, what do you try first? The best LLM apps have figured out that it helps new and experienced users alike to provide some suggestions on what can be done with the tool.
This technique increases stickiness / decreases likelihood of new users bouncing, and also doubles as a convenient method for introducing new features to repeat visitors, who may not be aware of whatâs been updated since the last time they used it.
ChatGPTâs landing page, which provides several ideas for where to start (as well as a disclaimer about information accuracy)
Perplexityâs Discover tab acts as sort of a news feed, giving users an entry point into what other people are curious about right now. Users can very easily start their own chats on the same topic, then wander in whatever direction theyâd like.
Perplexityâs Discover tab, which includes many interesting news stories to start a conversation about
Perplexity additionally places suggestions for ârelatedâ questions at the bottom of each answer, providing an easy way for the user to continue with the conversational thread without having to type out some of the more obvious follow-up questions.
Perplexityâs âRelated Questionsâ component, which displays at the bottom of each answer
In our Quickstart Guide for Tidepool, we show off our Suggested Attributes feature. This feature takes in some context about the data itself (pictured here), as well as what kind of analysis the user is interested in predicting (not pictured), and uses it to detect Attributes that are correlated with target metadata, or otherwise aligned with the userâs preferences:
The bottom portion of Tidepoolâs Suggest Attributes form, displaying the list of attributes that were detected in an airline support datase
Make AI a supporting feature, not the main draw
Examples: Gmail and Google Docs autocomplete, GitHub Copilot, Notion
This can be a particularly good strategy for consumer apps which are used by a wide variety of demographics and which need to be very careful about changing anything about their existing design, for fear of angering or confusing longtime users.
My partner, who isnât as interested in todayâs AI shenanigans as I am, was typing into Gmail one day, and I watched as Gmailâs algorithm correctly predicted many words that he was typing, but he never accepted the suggestions. I asked, âDo you ever accept the suggestions that Gmail gives you?â and he said, âWhat?â
He, a daily user of Gmail for 10+ years, had not even noticed the suggestions as they were coming through, presumably because (a) heâs a fast typist and is generally just focused on finishing writing the email, and (b) the suggestions are actually quite subtle, only appearing as âshadow textâ one or two words at a time, and are thus pretty easy to ignore.
Google Docs, correctly predicting what I was about to write in this post
Perhaps some might consider this a bit too subtle of an implementation, since even a daily user of Gmail didnât realize what was happening there, but it is a surefire way to avoid angering existing users when introducing a potentially controversial feature. The feature is simultaneously easy to use while also being nearly impossible to misuse.
For pro users who do want to use the AI features all the time, add the ability to trigger the AI features at-will with an easy-to-reach command. For example, Copilotâs âautomatic suggestionâ behavior can be turned off completely, and users can simply invoke the command to trigger a completion when they do want some assistance. This can reduce feelings of the AI âgetting in the wayâ by producing distracting or irrelevant suggestions that interrupt the userâs train of thought.
Notion does something similar by providing an unobtrusive shortcut to get to the AI features while the user is in a new block in page editing mode. Since people donât typically begin a block with âspace,â Notion can use a very easy-to-reach and easy-to-remember key with low risk of people triggering it accidentally.
Notionâs subtle UI hint about how to invoke AI commands
Expand on or rewrite user input behind the scenes to produce a better result
Examples: ChatGPT + DALL-E 3, Tidepool
Anyone running an LLM pipeline (using LLM outputs for LLM inputs) has likely noticed by now that quality can vary drastically depending on user inputs. At best, the model produces a less-than-stellar answer, and at worst, your model could get jailbroken and produce unpredictable or problematic output. As such, itâs a good strategy to fence off user inputs from the rest of the prompt as much as possible, or perhaps not even use the user input directly.
OpenAI uses this strategy to great effect with DALL-E 3. When a user enters a DALL-E 3 prompt to ChatGPT, the userâs prompt is rewritten behind the scenes, specifically in a way that significantly increases the detail of the prompt. Additional information is included, such as the desired focus and composition, the background and other objects in the scene, the lighting, and the vibe that should be evoked.
Generally, the ârewriterâ step is instructed not to change any details provided by the user, but rather to augment the prompt with any details that donât yet exist. The rewriter may also do things like fix errors (for example, reinterpret some text written by the user that appears to be a typo rather than their real intent). This helps shorten the feedback loop for users, who may end up with an unintended generation result after a long wait time otherwise.
Me asking DALL-E 3 for a âpicture of a puppy,â with no further details
The much longer and more comprehensive prompt that they rewrote it to in the background, shown on expanding ChatGPTâs response
We also leverage this technique in Tidepool: we take the initial description provided by the user about how the text data should be analyzed, and expand it into a more detailed labeling guide for the model behind the scenes, which allows us to create a high-quality zero-shot classifier for their desired analysis.
This technique should be used carefully, as some users could get annoyed by the model making assumptions about what they wanted. Be sure to use it in contexts where the result will not imply that your app is passing judgment on the quality, utility, or validity of the userâs original input.
Add affordances for selecting particular pieces of context
Examples: Perplexity, Sourcegraph Cody, GitHub Copilot
For use cases that rely on RAG (retrieval-augmented generation) to function effectively, retrieving the right context is extremely important. Oftentimes, the user can have a better sense of which context is required than whatever system is in place to automatically retrieve context based on the initial query.
For this reason, itâs a good idea to allow the user to selectively specify context that should definitely be included in the retrieved documents. This can augment the existing retrieval system and produce better answers than the model could have produced from only the automatically-retrieved documents.
Perplexity has the ability to select a âfocus,â for when the user preferentially wants the system to read from a particular source or set of sources. âRedditâ focus is one of my most-used focuses when I want real peopleâs experiences with something, because it will only reference search results from Reddit. Oftentimes, I prefer this in lieu of the more âcuratedâ information coming from SEO-optimized review websites, which is what I would get with the default focus.
âAttachâ also allows the user to directly upload a document of their choosing that will be exclusively referenced while generating an answer.
Perplexityâs âFocusâ panel, allowing users to steer the search results referenced by the model
Sourcegraph Cody and Github Copilot both also provide examples of this. Copilot allows you to add @workspace
to the beginning of your query to ensure Copilot answers your question while referencing context from your codebase.
Cody goes even further, allowing you to use @
to include specific files, or even @#
to include specific symbols (e.g. functions, classes, and so on) so that you donât have to just cross your fingers and hope that the retrieval system finds the right context within your entire workspace.
Cody allowing me to add specific files to be referenced when answering my question
Cody allowing me to include particular classes, functions, and variables when answering my question
Make verification and fact-checking easy for users
Examples: Arc, Perplexity, GitHub Copilot, Sourcegraph Cody
When asking Arc a question about a long page, it will actively warn you that it was only able to fit X% of the page into the modelâs context window. And when clicking on âFind on Pageâ for a provided quote, if itâs not able to find that exact quote, it will warn you that the quote may have been hallucinated by the model.
Arc helpfully letting me know that the Wikipedia page for âElectronic dance musicâ was too long for it to review the whole thing, and additionally indicating that the quote provided in the answer may not be a real quote from the article
Perplexity is another obvious example, prominently providing the list of sources for questions, along with their inline citation numbers. We can see that this specific claim about the release date is purported to be present in at least 3 sources, making it perhaps more trustworthy than a claim with only one backing source.
Perplexity providing a source list and several inline citations for its answer.
The following example from GitHub Copilot is a particularly illustrative one for this UX pattern, because Copilot seems to have referenced none of the correct files for my question, and as a result produced a low-quality (although partially correct) answer. As a user, seeing that the referenced files are wrong helps me level-set my expectations as to how likely the model is to be accurate when answering my query. This helps users avoid wasting time due to taking false advice or information from the model.
Using the previously-mentioned @workspace shortcut to ask Copilot about our frontend codebase; Copilot returned a list of irrelevant files as its references, so I know I can disregard the answer
Conclusion
Weâre still only in the very beginning stages of developing LLM-powered interfaces, but there have been some early lessons learned in this first couple of years of building in earnest.
The most important themes here are to:
- Onboard gradually and unobtrusively, by integrating AI features into default workflows and giving gentle nudges and suggestions
- Empower the user to do the steering, by making it easy to invoke the AI features only when the user needs them
- Provide affordances to standardize common or frequent tasks, such as knobs, settings, and templates
With this mindset of progressively adding value to the user via suggestions, staying out of the way when they want to be in charge, and empowering pro users to customize the tool for their preferences, we can bring forth the kind of intuitive, empowering, time-saving workflows that the advent of generative AI promises to deliver.