PDA Functions

Lambda functions inside a PDA for edge analytics

PDA functions is the PDA's capability to run any algorithms on data within the PDA without leaking it anywhere else. It extends the core PDA functionality to include algorithms ranging from simple summary of PDA data to personal AI. Think of them as private serverless functions.

PDA functioned were previously known as "Smart Hat Engine" (SHE) functions

Key goals for PDA functions are:

  • Supporting algorithms written in a wide range of languages, providing flexibility in choosing the best tool for the job

    and limiting the need to reimplement any existing algorithms.

  • Not forcing open-sourcing of the algorithms. As many organisations consider their algorithms to be the "secret sauce",

    they should be able to maintain their secrecy when operating within the HAT ecosystem.

  • Generating more data for the person — while an organisation owns their algorithm, the person still owns all of their data.

  • Providing a trusted environment that is sufficiently isolated from the core PDA to eliminate the risks of unauthorised

    data access and to respect the legal data rights of the PDA owner

  • Preventing any personal data leakage while running potentially untrusted, closed-source algorithms.

  • Elasticity in scaling — supporting large numbers of users as well as minimising resource cost when inactive, without burdening the algorithm developer.

Containerised applications appears to be the obvious choice due to the possibility of writing them in any language and having isolation guarantees. The rest can be controlled through a well-defined interface between the PDA itself and the AWS Lambda runtime. Algorithms run in such an isolated environment with no ability to communicate with the outside world, enforced through firewalls and security policies. An algorithm only runs reactively in response to a request from a PDA, processes the received data and returns results in a response. The downside of the approach is that it does not allow for accumulating data over longer periods of time (the PDA does it itself), it does not allow for aggregation of data across multiple users, and the algorithms that can be executed are limited to ones that are fixed ahead of deployment, whether traditional code or pre-trained Machine Learning models. Serverless environments (such as AWS Lambda) allow for the remaining goals of elasticity, on-demand use and ease of deployment.

Current limitation in the AWS Lambda environment is that it provides little detail and no guarantees on how a specific container instance gets reused, there are possibilities for timing-related attacks. Specifically, a common optimisation is to have some state retained in a given container (more in the sense of caching than storage as there are no guaranteed that the same container will get used), however that state can also contain data previously received from a PDA. Although, interactions with a given function are driven by PDAs and not functions themselves, and functions are unable to communicate with the outside world, they could respond with custom responses to a specific PDA controlled by the perpetrator. This, too, is mitigated through metadata logging, but additional controls around function scheduling and execution could eliminate the risk.

Building a function

PDA functions are currently standard AWS Lambda functions and benefit from a wealth of information on how to build such functions.

While an over-simplification, it is not inaccurate to say that you can just drop in an algorithm you have already written or write one in any major language and framework:

  • Node.js

  • Java (Java 8 and other languages that are supported by the runtime — we ❤️Scala!)

  • Python

  • .NET Core

  • Go

Furthermore, the PDA uses the industry-standard JSON protocol for handling data, therefore what your algorithm receives is simply a bundle of JSON records (sometimes called documents) matching your specific Data Bundle query (check the docs on Data Bundles for more details).

Your function needs to do 3 things:

  1. Publish its configuration to simplify editing the details.

  2. Return Data Bundle specifying what data it wants to receive, parametrised by the date range (fromDate and untilDate query parameters in ISO8601 format).

  3. Accept data processing request which includes the current known configuration from the PDA and the bundle of data itself

    generating using the bundle received from (2)

A common recommendation is to split your algorithm details from the Lambda function handling details — it makes testing and debugging a lot simpler. You should try and develop your entire algorithm outside the PDA(the serverless framework includes a helpful set of tools for that), exposing the three steps above as separate API Gateway endpoints. You should be able to feed the generated Data Bundle definition into the PDA you use for development, as well as the data extracted from the PDA using the bundle into your algorithm for processing.

Everything else is the details of your own implementation!

Limitations

AWS Lambda functions and by extension — PDA functions also have some limitations worth noting:

  • You can allocate between 128MB and 3008MB of memory to the function

  • It has 512MB of "ephemeral" disk storage — some for storing temporary files. Do not rely on it persisting between runs.

  • Running time is limited to 5 minutes max — you will need to manage efficiency and amount of data processed in a run.

  • Maximum request size is 6MB. You will not be able to process a huge amount of an individual PDA's data in one go, but 6MB can fit a lot of JSON.

  • Deployment package is no bigger than 250MB (though you can load e.g. your models externally, taking care to make sure

    the algorithm execution does not time out when including loading of the model)

  • PDA-specific: you cannot communicate with any remote networked resources, even if you have a great use case, to limit possibilities of leaking user's data

  • PDA-specific: function execution is driven by the PDA itself, you cannot subscribe to other sources of Lambda events

When PDA functions are executed

Each function available in a particular PDA cluster is registered in the PDA's static configuration, which provides the ID of the function along with the version to be used, namespace and endpoint the function is allowed to publish data to and the details necessary for the PDA to know how to invoke it.

PDA internally tracks data "events" and with incoming data events it determines what functions may need to be invoked on the data. The current approach is rather straightforward: the PDA accumulates a bunch of events and checks what endpoints they were for. It then compares the set of endpoints against the functions enabled for the PDA and if there is an overlap — checks trigger details for the function. A new function execution with all data since the last execution matching the bundle is started when the trigger is either individual (should be run for every individual data record) or period and at least the specified period of time has passed since last invocation.

It is important to note that unless triggered manually via an API endpoint, functions for a PDA will not run if there is no new data coming in, generating data events which in turn trigger functions. In a completely inactive PDA, such functions would never be executed.

How PDA functions are executed

Every time the PDA decides it needs to execute a function, it performs three steps:

  1. asks the function to provide it with a Data Bundle definition for the timeframe between most recent execution and now

  2. sends the current known function configuration (including last execution time) together with the data retrieved for the bundle configuration to the function

  3. saves the returned data and the time of execution

This results in the generated data becoming available for the PDA owner and other applications the same way as any other data, with no need to deal with the complexities of running algorithms, managing dependencies between components or running dedicated infrastructure.

Function information reference

Each function publishes its configuration through a Lambda function handler, this section provides the details on what information is included. Note that publishing of the function is managed by the PDA Service Provider and the information will always be reviewed.

FunctionConfiguration

Property

Type

Description

id

String

unique function identifier

info

FunctionInfo

defines user-visible information about the function, including any descriptions and graphics

developer

ApplicationDeveloper

defines function developer details

trigger

FunctionTrigger

when does the function process data

dataBundle

EndpointDataBundle

the default bundle of data the function uses

status

FunctionStatus

status of the function on the given PDA

FunctionInfo

Property

Type

Description

version

String

version ID of the function, in 3-digit semantic versioning format

versionReleaseDate

ISO8601 Date with Time

the date this version of the function was released

updateNodes

ApplicationUpdateNotes

any update notes for the current function release

name

String

user-readable function name

headline

String

headline (short description) of what the function does

description

FormattedText

long, formatted description of the function in plain text and optionally Markdown/HTML

termsUrl

String

URL to the terms between PDA owner and the function developer

supportContact

String

support contact details, email address

graphics

ApplicationGraphics

graphical elements to build the UI from, primarily images. Each follows the format of a “Drawable” object, which has a url to the “normal” size image as well as optional ones sized as small, large and extra-large, targeting different screen sizes

dataPreviewEndpoint

String, Optional

if function's data can be previewed directly in the PDA App, PDA API endpoint designated for the preview

FunctionTrigger

Property

Type

Description

triggerType

String (periodic, individual or manual)

defines how the PDA should invoke the function as described below (periodically, for each individual record or manually)

period

ISO8601 Period or Number (milliseconds), only with periodic type

if executing periodically, the approximate period between subsequent executions

FunctionStatus

Property

Type

Description

available

Boolean

whether or not the function is currently available for use on the PDA

enabled

Boolean

whether or not the function has been enabled by the PDA owner

lastExecution

ISO8601 Date with Time, Optional

the last time the function was successfully executed (not set if never)

executionStarted

ISO8601 Date with Time, Optional

if currently executing, time when the current execution started

Testing your function

Use the Function Testing postman collection we have prepared for this purpose!

You will need to configure the function environment with your own function's API gateway details, choose the PDA you want to use for testing and update PDA credentials for the whole collection to run successfully. Once you are satisfied that it works correctly, please contact us to have it reviewed and integrated.

Function management

All function management is performed through the endpoints to list, setup and disable the apps as well as in certain cases — get the application token for the frontend to use in authenticating with a remote service.

Listing functions

Applications are listed at /api/v2.6/she/function — returns the full list of available functions

This method is the only one needed to call to get a comprehensive list of functions along with their status on the PDA (available, enabled, execution time).

An individual function information is accessible at /api/v2.6/she/function/:function-id but this shouldn’t be needed in most cases. It will have exactly the same information and format as a single item in the list returned by /api/v2.6/she/function.

Setting up

Function is set up by calling GET /api/v2.6/she/function/:function-id/enable

The steps of setting up a function with a PDA happen transparently after calling the enable endpoint.

Similarly, a function gets disabled by calling /api/v2.6/she/function/:function-id/disable. This takes care of recording the fact on the PDA, disabling any PDA access and suspending future function invocations.

Executing

Most functions are expected to be executed automatically by the PDA, however it is still possible (with "owner" PDA permissions) to execute a function manually by calling GET /api/v2.6/she/function/:function-id/trigger

Function availability

It is important to note that only PDA functions that have been registered with a PDA cluster will be available to use.

It is achieved in the PDA's configuration (application.conf) and therefore new functions currently require the PDA to be redeployed with them included in the configuration:

she {
functions = [
{
id = "data-feed-counter"
version = "1.0.0"
baseUrl = "https://ociflwukh1.execute-api.eu-west-1.amazonaws.com/dev"
namespace = "she"
endpoint = "insights/activity-records"
}
{
id = "sentiment-tracker"
version = "1.0.0"
baseUrl = "https://ociflwukh1.execute-api.eu-west-1.amazonaws.com/dev"
namespace = "she"
endpoint = "insights/emotions"
}
]
}

The format of the configuration should be self-explanatory: the configuration provides a list of functions, each identified by an ID, the version to be used, baseUrl as the address of the API gateway, and finally — namespace and endpoint it is allowed to create the data in. The rest is done automatically by the PDA, including loading the full configuration, issuing the calls, saving the data, etc.