Smart HAT engine (SHE) is the HAT's capability to run any algorithms on data within the HAT PDA without leaking it anywhere else. It extends the core HAT PDA functionality to include algorithms ranging from simple summary of HAT PDA data to personal AI.
Containerised applications appears to be the obvious choice due to the possibility of writing them in any language and having isolation guarantees. The rest can be controlled through a well-defined interface between the HAT PDA itself and the Smart HAT Engine. In SHE, algorithms run in such an isolated environment with no ability to communicate with the outside world, enforced through firewalls and security policies. An algorithm only runs reactively in response to a request from a HAT PDA, processes the received data and returns results in a response. The downside of the approach is that it does not allow for accumulating data over longer periods of time (the HAT PDA does it itself), it does not allow for aggregation of data across multiple users, and the algorithms that can be executed are limited to ones that are fixed ahead of deployment, whether traditional code or pre-trained Machine Learning models. Serverless environments (such as AWS Lambda) allow for the remaining goals of elasticity, on-demand use and ease of deployment.
Current limitation in the AWS Lambda environment is that it provides little detail and no guarantees on how a specific container instance gets reused, there are possibilities for timing-related attacks. Specifically, a common optimisation is to have some state retained in a given container (more in the sense of caching than storage as there are no guaranteed that the same container will get used), however that state can also contain data previously received from a HAT PDA. Although, interactions with a given function are driven by HAT PDAs and not functions themselves, and functions are unable to communicate with the outside world, they could respond with custom responses to a specific HAT PDA controlled by the perpetrator. This, too, is mitigated through metadata logging, but additional controls around function scheduling and execution could eliminate the risk.
SHE functions are currently standard AWS Lambda functions and benefit from a wealth of information on how to build such functions.
While an over-simplification, it is not inaccurate to say that you can just drop in an algorithm you have already written or write one in any major language and framework:
Furthermore, HAT PDA uses the industry-standard JSON protocol for handling data, therefore what your algorithm receives is simply a bundle of JSON records (sometimes called documents) matching your specific Data Bundle query (check the guide on Data Bundles for more details).
Your function needs to do 3 things:
untilDatequery parameters in ISO8601 format).
A common recommendation is to split your algorithm details from the Lambda function handling details — it makes testing and debugging a lot simpler. You should try and develop your entire algorithm outside the HAT PDA(the serverless framework includes a helpful set of tools for that), exposing the three steps above as separate API Gateway endpoints. You should be able to feed the generated Data Bundle definition into the HAT PDA you use for development, as well as the data extracted from the HAT PDA using the bundle into your algorithm for processing.
Everything else is the details of your own implementation!
AWS Lambda functions and by extension — SHE functions also have some limitations worth noting:
Each SHE function available in a particular HAT PDA cluster is registered in the HAT's static configuration, which provides the ID
of the function along with the version to be used,
endpoint the function is allowed to publish data to and
the details necessary for the HAT PDA to know how to invoke it.
HAT PDA internally tracks data "events" and with incoming data events it determines what functions may need to be invoked on the data.
The current approach is rather straightforward: the HAT PDA accumulates a bunch of events and checks what endpoints they were for.
It then compares the set of endpoints against the functions enabled for the HAT PDA and if there is an overlap — checks trigger
details for the function. A new function execution with all data since the last execution matching the bundle is started when
the trigger is either
individual (should be run for every individual data record) or
period and at least the specified
period of time has passed since last invocation.
It is important to note that unless triggered manually via an API endpoint, functions for a HAT PDA will not run if there is no new data coming in, generating data events which in turn trigger functions. In a completely inactive HAT PDA, such functions would never be executed.
Every time the HAT PDA decides it needs to execute a SHE function, it performs three steps:
This results in the generated data becoming available for the HAT PDA owner and other applications the same way as any other data, with no need to deal with the complexities of running algorithms, managing dependencies between components or running dedicated infrastructure.