musing on sensor systems

There are some sensors that are deployed into the world unto themselves. Even so, they can be regarded as a system: they have power management, storage, sensors, and an enclosure that must all work together effectively in order to accomplish their task.

As sensors grow more complex, they might draw continuous power, dropping batteries. They might not store data locally, but instead on a server. Through all of this, though, the sensor is part of a system, and that’s the critical realization of the last day or two of musing.

sensors as systems

Imagine a sensor that monitors the presence of wifi devices. It does not do this to track individuals, or monetize anything, but instead to answer the question “how many people came and went from a place during a day?” The sensor has an ephemeral quality to it; it might “know” that the same device appeared more than once in a day, but it would not recognize the same device tomorrow. Nor does it care to record anything that might invade an individual’s privacy; it is enough to know they were present for 37 minutes between 1PM and 2PM (for example).

I want that sensor to report its data; there’s going to be hundreds, if not thousands, of them in the world, and it is too much to imagine collecting the data from the sensors individually. We’ll assume they have power and network, so they can POST their data to a server somewhere. So far, so good.

The big questions that bother me, though, are questions of trust and deployment. How do we know a sensor is a trustworthy sensor (and not one established by a vagabond seeking to inject bad data into our network), and how do we go from zero sensors to thousands… all while maintaining trust?

This is where the heading sensors as systems comes from. They are often referred to as sensor networks, although that term is sometimes used to denote a set of sensors that talk to each-other. In this case, the sensor do not talk to each-other, so I’ll use the term sensor system instead. The sensors cannot be thought of as independent of the server they talk to; they are not two separate pieces, but instead two pieces of a kind; they are part of the same system. To conceptualize them separately makes things harder, not easier, to design and build.

the topology

Sensors send data to servers.

                         ┌──────┐
                     ┌───┤sensor│
                     │   └──────┘
                     │
                     │
                     │
                     │
                 ┌───▼────┐
                 │        │      ┌──────┐
                 │ server ◄──────┤sensor│
                 │        │      └──────┘
                 └───▲────┘
┌──────┐             │
│sensor├─────────────┘
└──────┘

That’s it. We assume servers are always present, always on, and have names on the network that can be resolved by DNS.

However, that’s a topology of the network when it is functioning and live. How does it come to be?

the server births sensors

How do we set this up?

owner                       agent                        server

                               ────────────────────────────►
                                       request sensor

  ◄────────────────────────────
        alert sensor req

                               ◄────────────────────────────
                                       download image

                               ───────────────► X
                                  report data

  ────────────────────────────►
      approve sensor req

                               ────────────────────────────►
                                        report data

An agent sets up a sensor configuration on the server and downloads an image. This is where the sensor is told where it will live, it’s “sequence ID,” and it is given a unique API key so it can submit data to the server. However, once set up, it will fail to submit data to the server; the API key is not yet approved.

The server owner receives a notice that a new sensor has been configured, and can use the information provided to decide if it is legitimate. If so, the server owner approves the sensor, and from that point forward, data will be successfully submitted and stored.

In this way, we establish the chain of trust between sensors and the server. The server creates the sensor image; when it creates the image, it can embed a secret that only that particular sensor knows. However, we do not yet trust that sensor, because the server owner has not acknowleged it.

How do we establish trust between the server owner and the creators of sensors? The “old fashioned” way might be to require sensor creators to provide a phone number, and the server owner calls those people. Another might be that we know the server creators will be a member of a known set of individuals. Therefore, we could share a “secret” with them: a passphrase, perhaps, that is used at time of sensor creation. This is not a “password” in the strictest sense, but instead something that indicates that the sensor creator has knowledge that is not “common.” It might be a single word (e.g. “bulldogs”), or it might be a phrase (“The celery stalks at midnight”). If the sensor creator can enter that “shared secret” at sensor creation time, that is enough for the server owner to accept or reject the sensor with a high degree of confidence that the creator can be trusted.

This may sound “low security,” but we are wondering how to establish a computational chain of trust within a known community. The particular use-case is such that a server owner will know a priori all the possible creators of sensors. A low-security “passphrase,” shared within that (high-trust, small) community, is enough to throttle or otherwise filter bogus requests for sensor creation.

small images: netinst

To build a sensor, we need to bootstrap a piece of hardware from a chunk of plastic and silicon to a functioning computer. This involves installing an OS on the device. But… where do we get the OS? Perhaps the server should provide a customized image, such that every sensor gets a unqiue bootstrap.

This is almost what openBalena does to set up a sensor network. openBalena establishes a server, and from it, you can download a “seed image” for the edge nodes in the network. The seed is a binary image to be flashed onto the HW, and there is only one seed image. However, we don’t want to create full binary images for these sensors (yet). Instead, we want to use an existing open platform to bootstrap the sensors, and make sure those sensors are configured securely and running our sensor software. And, we probably do need to generate a unique “seed” image for each sensor for our particualr use-case.

We can do this by letting each sensor bootstrap itself from a tiny netinstall image into a full device. There’s a nifty Raspberry Pi net install package on Github. It works like this:

You grab a FAT32-formatted SD card. (This is the typical filesystem on any card larger than… 2GB? 4GB? Something like that.)
You copy the files from the .zip onto the card.
Plug the Pi into a wired network with DHCP.
Stick it in the RPi, and the card bootstraps and proceeds to rebuild itself in-place.

When you’re done, you have a full Raspberry Pi running the most recent version of Raspbian, configured and customized to your liking.

customization

What’s fun about the netinstall is that it has multiple ways to customize the installation.

First, you can set a few parameters in an installer-config.txt. These parameters include things like the base set of packages to pull (minimal, server, etc.), timezone, the final action after setup (reboot, poweroff, etc.). There are more “advanced” parameters that can be configured, including providing a set of files that are available at time of configure (e.g. additional config files, binaries, and so on), a post-install script, and (particularly interestingly), the ability to provide a URL for an online_config that will be executed after the installer-config is executed.

The combination of these mean that we should, for example, be able to do the following:

Add an additional repository to drive the installation of additional packages at time of device setup.
Place the API key (or, perhaps, signed JWT token?) in the filesystem at a known location for later discovery by sensing software.
The ability to execute arbitrary code/scripts post-install, allowing for additional customization or lockdown.

The entire bundle is approximately 64MB. A full Raspbian image can be 10-20x larger. If 500 users tried to create sensors “at once,” we would only be storing 32GB of data. Bootstrap images can be removed after download, meaning we should not have to worry about disk capacity (modulo some moderate DDoS protections).

automation

This can also be moved off of the server. The server owner can get an API key that allows them to run this entire process offline. In this way, if the server owner wants to (say) create 50 card images, a small script could be provided that:

Creates the custom data from a CSV file,
Retrieves an API key (or signed token, or whatever) from the server,
Copies it to a uSD card, and
Saves the bundle of metadata to the sevrer.

In this way, the server is not storing images, but just tracking the relevant API keys and “enabling” or otherwise “approving” the sensors in a bulk sequence of actions. The uSD cards can then be created as quickly as a server owner can insert and remove cards from their computer.

Because it is lightweight, it could even be that a Raspberry Pi is used as the “sensor creation” device.

QUESTION: Should a small binary application be created that… creates the uSD cards? In this way, end-users do not go to a web page and download a zip (that must be decompressed and put on a uSD card), but instead download a small application that lets them enter their configuration data, and it writes it directly to a uSD card? Would this eliminate potential points of failure?

aside: a network of servers

It should be possible for an individual to set up a server easily, and once set up, establish their own sensor network. Given our use-case, each server might provide an easy way to provide its data via an open API. However, we also imagine these servers as part of a federation of data collection servers.

Because we imagine our servers as collecting and providing open, public data, we should have a “mirror” API endpoint. This endpoint might allow an upstream agent to request data from the server, and in doing so, create a centralized resource containing all of the data from all of the live servers in the network. It’s a small thing, and requires no new trust networks, but it is a component that might need to be considered as part of the whole.

conclusion

The sensor network needs to be thought of as a whole. Sensors are not something that are built and configured independent of the server, but instead something that are created by the server, and in this way, their provenance and connection to the server is unquestionable.

Matt Jadud cv | linkedin | github | bitbucket