Skip to content
Wysp Docs

Introduction

Wysp is a stateful server-side navigation and geospatial platform that is designed to augment existing LLM-based agents.

It stays invisible to your users, and transparently causes your agent to be knowledgeable, consistent, and grounded across all geospatial tasks.

This is accomplished by an exchange of context & tools with your inference server, and an exchange of GPS location and audio content with your client (smartphone app or AI-native voice hardware).

The design of the API primarily targets agents that use realtime voice as the interface, and takes great care to be complete and unambiguous in a screenless environment. The APIs are websocket-based and support a wide range of hardware, from flagship devices all the way down to hobbyist-grade microcontrollers.

When fully integrated, Wysp encapsulates all the complexities of placing the agent in space and time alongside the user, built up via 5 internal layers:

[user]

stop at a dep once I’m east of the main

In Montreal, ‘dep’ refers generically to a type of corner store, ‘the main’ is understood to mean Boulevard Saint-Laurent, and ‘east’ is understood to mean northeast.

The foundation of Wysp is a language-based representation of the urban environment that aims to closely match the mental model of the local residents. This data (along with a collection of models, heuristics, and programs) provides grounding of both information and reasoning.

[user]

it’s somewhere downtown

Areas

[user]

I’m facing the yellow building

Visual descriptors

[user]

how can I buy a bus pass?

Task-oriented Index

[user]

what’s a cool spot for a first date in saint cats?

Discovery / research,
Local slang

[user]

find a gas station somewhere across the river

Bisections,
Prepositional scoping

[agent]

Your bus is delayed 3 minutes

Short-term memory,
Realtime transit

[agent]

Face downhill and head towards the church

Slope bearing-finding,
Visual distinctness

[agent]

The entrance is just out of view, around the corner to the right

Sightlines

We additionally produce datasets that cannot be directly interacted with, but which inform routing and communication decisions.

  • Mapping of experience-centered aspects such as attentional demand density, regional costliness of manoeuvre errors, perceived safety, etc.
  • Prominence data (name recognition from local and visitor perspectives, visual distinctness, size) of landmarks, spaces, divisions
  • Perception-based manoeuvres, distances, and directions (what things “feel” like, and how that differs from the ground truth)

Instead of relying on a context window that is bloated with tools, Wysp proactively covers most likely queries by providing a dense and evolving stream of context.

Spatial and temporal content naturally becomes incoherent as time passes and the user’s position changes, so this context is continually updated and compacted.

Context Window
  

This proactive approach ensures that the context window is lean, and the errors and latency incurred by tool-calling can be skipped entirely for many interactions.

A short term (~30 minutes) memory of the user’s movements is exposed as natural language to ground interactions.

In some cases, these memories attach to special subsystems, such as remembering when/where a tunnel was entered, parking locations, or public transit boardings.

Context Window
  

An encoding of the user’s preferences and reference points is persisted in an introspectable format that is durable across model updates and context compression.

This encoding is continuously updated implicitly by user interactions, and can also be explicitly altered by direct requests.

These memories inform the interpretation of the user’s queries, and the behavior of the navigation engine.

[user]

don’t take me on this road again

[wysp]

Ok! I’ll avoid Vålveien in the future.

User Weight Updates

very unlikely to route on this road in the future less likely to route on roads with similar exceptionalities slightly less sensitive to time cost slightly more sensitive to cost traits of similar roads

[wysp]

your ETA has slipped to 12:15

[user]

shut up

User Weight Updates

substantially less likely to have the ETA impulse again slightly less likely to have other similar impulses make fewer statements give fewer details in statements give fewer confirmations give fewer repeated details

[wysp]

In Grunerløkka there’s a Harald’s Vaffel

[user]

is that the one by Olaf Ryes plass?

[wysp]

Yes! Just a few steps down Markveien from there.

User Weight Updates

familiarity with named geometry “Olaf Ryes plass” familiarity with place “Harald’s Vaffel” familiarity with area “Grunerløkka” familiarity with city “Oslo” familiarity with named geometry “Markveien”

[user]

just call it “home”

[wysp]

Got it, Pilestredet 47C is “home”

User-Private Memory

{
name: "Home",
streetAddress: "Pilestr...",
coordinates: {
"lat": 59.9213,
"lng": 10.7323
}
...
}

The navigation engine is driven forward by GPS updates, time passing, and user interactions. It manages routing, narration, and generating context based on the user’s recent movements and future objectives.

Based on raw GPS fixes, it manages a continuous simulation and forecast of the the user’s position, stateful snap-to-road that takes recent history into account, detection of parking locations, route deviations, unplanned stops, and destination arrivals.

The navigation engine contributes live context chunks that ground any user interactions in the context of the present and future aspects of the navigation. It hands clarifications, exception reports, route modifications, ETAs, etc.

Context Window
  

Statements are pre-rendered audio chunks that can be preloaded onto the client device, and triggered locally in response to GPS movements. They are an optional (but strongly recommended) optimization that allows navigation to continue unhindered even in the even of network loss.

You can ensure they are heard in your agent’s voice by integrating Custom Voicing.

Structured metadata is emitted for use client-side, in the form of Route Geometry, HUD items, and a Plan.

Wysp is focused primarily on screenless experiences, but this metadata is sufficient for rendering most traditional navigation UI controls when needed.

{
"travelMode": "WALKING",
}

Transport Mode

{
"destinations": [
{
"name": "Eerste Jan van..."
}
]
}

Destinations

{
"geometry": {
"fix": null,
"triggers": {},
"routeLines": [],
"routePoints": []
}
}

Geometry

{
"plan": {
"durationSeconds": 965,
"distanceMeters": 1150
},
}

Plan

Wysp leverages personalized memory, local knowledge, and landmarks to communicate in a dense and unambiguous way.

Particular attention is paid to the disambiguation of departures, arrivals, and ambiguous or complex manouvers.

[agent]

Turn left.

Simple Turn

[agent]

Turn left after the 7-11.

Ambiguous Placement

[agent]

Turn slightly left at the roundabout, heading uphill from the third exit.

Ambiguous Exit

[agent]

Go straight through all the roundabouts.

Repetition Collapse

[agent]

Follow signs for Gothenburg.

Anchor-destination Collapse

[agent]

Go back the way you came.

Short-term memory

[agent]

To get started, head downhill towards the lights on Sannergata.

Descriptive Departure

[agent]

Økernveien 5B is coming up on the right, the green apartment building with entrance directly on Økernveien.

Descriptive Arrival

[agent]

Stay on this road for the next 5 minutes.

Pre-silence Comfort

When a navigation looks simple enough to be communicated (and remembered) in one shot, the entire narration is delivered as a single Utterance:

[agent]

3 minutes to walk. Head down Rosenkrantzgate towards the water, take the second left onto Karl Johan’s gate and the Grand Hotel will be right there on your left.

If turn-by-turn guidance has been rejected by the user, they can still ask for chunked guidance on-demand:

[user]

how long to drive to office?

[agent]

About 20 minutes. Traffic is normal. Do you want directions?

[user]

no I know the way

[user]

where do I go next?

[agent]

Continue on Ring 2. When you hit the roundabout, turn left, following the train tracks. 5 minutes to destination.

A typical full integration involves 3 parties:

  1. Inference/App Server - Server where context is assembled and inference triggered for the agent the user is conversing with.
  2. Navigation Device - The navigation client (ie. smartphone app or custom hardware) that travels with the user.
  3. Wysp Server - Wysp’s API and CDN servers.

The App Server connects to Wysp via the Admin API to manage Users, and create ephemeral Wysp access tokens to be handed out to individual users.

The Inference Server connects to Wysp via the User API, using 1 websocket connection per active User. Over that socket, tool calls are sent and context is received to be integrated into the User’s context window for inference.

The Navigation Device sends GPS updates to Wysp via the User API websocket, and receives preloadable turn-by-turn audio Statements and other UI-relevant metadata such as routelines and HUD metadata.

Integration of the Admin API must be done from a secure server-side location, but all usage of User API can be done from anywhere.

Wysp expects that, during usage, 1 or more clients are connected on behalf of the user via the User API.

In general, the following activities must be happening over those sockets:

  • consumption of context for agent inference
  • sending of tool calls by agent
  • sending of GPS updates
  • consumption of turn-by-turn guidance and other navigation metadata

Importantly, it doesn’t matter how many different devices and sockets are connected for a User, and those responsibilities can be split among multiple clients and servers as needed.

Wysp is in the preview release stage for testing with early partners. If you are interested in getting pre-release access to Wysp, email hello@wysp.ai to get in touch.