Filtering, Querying & Transforming Data

Data querying, filtering and transformation.

Data Bundling

Provided that the PDA stores any JSON-formatted data provided to it, it was important for us to design mechanisms for suitable data retrieval. Data bundling in the PDA allows for extremely flexible data transformations and filtering when retrieving data:

  • Picking specific parts (fields) of interest out of all the data available to avoid exposing data that is not required for the specific application. If you visualise data in a table, this would look like vertically slicing the table.

  • Filtering only the data that is required, based on values of the data stored. Using the table analogy, this would look like horizontally slicing the table.

  • Interleaving data from different, potentially heterogeneous endpoints – think about location data coming in from a range of different sources, when an application is only concerned with having the most recent longitude and latitude, no matter which application it has come from.

  • Restructuring the data to the desired JSON format on the fly, for example to unify the structure of data from different endpoints being interleaved or to reformat to something more convenient for the developer.

The first step in the process is to understand Data Combinators.

Data Combinators

The API supports a notion of custom data "combinators", with the key feature being data transformation. It allows for:

  • remapping data JSON from such different streams into structures chosen by the developer to facilitate consistent structures across unrelated sources

  • combining data from multiple feeds into a single response stream

  • ordering of data according to underlying JSON structure fields

  • filtering of data according to underlying JSON values (including text-based search)

  • registering a datapoint with a data-mapping specification and GETing data from the registered endpoint.

Creating a simple combinator

One of the simplest types of data transformation, is the remapping of the data structure. This can be done by creating a combinator:

Request: POST /api/v2.6/combinator/$COMBINATOR_NAME with header x-auth-token. Where $COMBINATOR_NAME is a chosen name for your data combinator. Combinator name can be any valid URL path, but must be unique – request will fail with an error otherwise.

Here's a simple example extracting two fields, longitude and latitude from a Rumpel location's endpoint and unwrapping them to a top-level object:

[
    {
        "endpoint": "rumpel/locations",
        "mapping": {
            "longitude": "data.locations.longitude",
            "latitude": "data.locations.latitude"
        }
    },
    {
        "endpoint": "rumpel/profile",
        "mapping": {
            "firstName": "data.firstName",
            "lastName": "data.lastName"
        }
    }
]

Fetching data from a Data Combinator

The created combinator can be used by simply sending GET to /api/v2.6/combinator/$COMBINATOR_NAME with header x-auth-token.

It responds with the same data structure as plain data APIs: with a list of data records wrapped with the basic record details and the data itself remapped according to the registered combinator.

[
  {
    "endpoint": "rumpel/locations",
    "recordId": "e965e022-6613-476a-a0cd-1f587a41b148",
    "data": {
      "longitude": "0.101014673709963",
      "latitude": "51.671358277138"
    }
  },
  {
    "endpoint": "rumpel/locations",
    "recordId": "fcf1a26b-e49f-4457-915b-156e14140f38",
    "data": {
      "longitude": "0.100905202634514",
      "latitude": "51.674001392439"
    }
  },
  {
    "endpoint": "rumpel/locations",
    "recordId": "8f7afa92-39e2-48ab-8028-f5aebaa9918e",
    "data": {
      "longitude": "0.080477950927866",
      "latitude": "51.6658257133844"
    }
  },
  {
    "endpoint": "rumpel/locations",
    "recordId": "d3a6f04b-4df6-4888-a7b0-c1d5ca272de9",
    "data": {
      "longitude": "0.0641066288762133",
      "latitude": "51.6641215101037"
    }
  },
  {
    "endpoint": "rumpel/locations",
    "recordId": "6a858d87-899e-4961-b722-0738d07c755e",
    "data": {
      "longitude": "0.0961801595986785",
      "latitude": "51.6712232446779"
    }
  }
]

Data Filtering

The combinator's API allows for powerful filtering of data according to the recorded values. The combinator gets created by POSTing a request to /api/v2.6/combinator/$COMBINATOR_NAME as previously. However, for each source of data you may also define one or more filters in addition to the endpoint and transformation used to remap the data:

[
  {
    "endpoint": "rumpel/locations",
    "filters": [
      {
        "field": "data.locations.timestamp",
        "transformation": {
          "transformation": "datetimeExtract",
          "part": "hour"
        },
        "operator": {
          "operator": "between",
          "lower": 7,
          "upper": 9
        }
      }
    ]
  }
]

The above example extracts the hour part of the location timestamp and filters for records with the hour between 7 and 9. If you add multiple filters, they act like logical AND operator: a data record has to match all filters to be included in the result. Every filter consists of three fields:

Parameter

Type

Meaning

field

String

The JSON path of the field to use for filtering – it can be a simple JSON value, an array or an object.

transformation

Transformation Object

Optionally transforms the field in question before applying a filter. You can find the supported transformations below.

operator

Operator Object

The filtering Operator. You can find the supported operators below.

  • transformation – currently supported transformations:

    • identity – keep the value as-is, effect is the same as if transformation was not defined

    • datetimeExtract with part – extract part of a date from an ISO 8601 formatted date field

    • timestampExtract with part – extract part of a date from a UNIX timestamp date field

    • searchable – convert the field to searchable text. Must be used together with the find operator below

  • operator – different operator types:

    • in together with value field, set to check if field is in (is contained by) value

    • contains together with value field, set to check if field contains value

    • between together with lower and upper values, checks if the lower < field < upper

    • find together with search field set to the search string to perform text-based search on. Must be used together with the searchable transformation above.

The illustrated ways of creating data combinators hopefully provide you with a comprehensive tool to extract data in any way you like. The next step is to build up a layer of bundles on top of them to allow for retrieving a bigger variety of data in one big bundle.

Data Bundles

Data Bundles add a thin layer around combinators, useful in 2 ways:

  1. Retrieving data into explicitly named properties from different combinators

  2. Accepts orderBy and limit parameters to control how many data points are returned for a specific bundle property

Using previously covered examples of profile and location data, they are clearly very distinct, but an application may still benefit from having both at the same time. For instance, it may only care for the most recent information on user's profile and their 5 most recent locations. This can be achieved with a POST request in https://postman.hubat.net/api/v2.6/data-bundle/localprofile with header x-auth-token and body:

{
  "profile": {
    "endpoints": [
      {
        "endpoint": "rumpel/profile"
      }
    ],
    "limit": 1
  },
  "location": {
    "endpoints": [
      {
        "endpoint": "rumpel/locations",
        "mapping": {
          "longitude": "data.locations.longitude",
          "latitude": "data.locations.latitude"
        }
      }
    ],
    "limit": 5
  }
}

The response includes the specific data requested:

{
  "profile": [
    {
      "endpoint": "rumpel/profile",
      "recordId": "9b136020-372a-4777-81f9-2c4ce6925aea",
      "data": {
        "profile": {
          "website": {
            "link": "https://example.com",
            "private": "false"
          },
          "nick": {
            "private": "true",
            "name": ""
          },
          "primary_email": {
            "value": "testuser@example.com",
            "private": "false"
          },
          "private": "false",
          "youtube": {
            "link": "",
            "private": "true"
          },
          "address_global": {
            "city": "London",
            "county": "",
            "country": "UK",
            "private": "true"
          },
          "age": {
            "group": "",
            "private": "true"
          },
          "personal": {
            "first_name": "",
            "private": "false",
            "preferred_name": "Test",
            "last_name": "User",
            "middle_name": "",
            "title": ""
          },
          "blog": {
            "link": "",
            "private": "false"
          },
          "facebook": {
            "link": "",
            "private": "false"
          },
          "address_details": {
            "no": "",
            "street": "",
            "private": "false",
            "postcode": ""
          },
          "emergency_contact": {
            "first_name": "",
            "private": "true",
            "relationship": "",
            "last_name": "",
            "mobile": ""
          },
          "alternative_email": {
            "private": "true",
            "value": ""
          },
          "fb_profile_photo": {
            "private": "false"
          },
          "twitter": {
            "link": "",
            "private": "false"
          },
          "about": {
            "body": "A short bio about me shown on my PHATA",
            "private": "false",
            "title": "Me the Test User"
          },
          "mobile": {
            "no": "",
            "private": "true"
          },
          "gender": {
            "type": "",
            "private": "true"
          }
        }
      }
    }
  ],
  "location": [
    {
      "endpoint": "rumpel/locations",
      "recordId": "e965e022-6613-476a-a0cd-1f587a41b148",
      "data": {
        "longitude": "0.101014673709963",
        "latitude": "51.671358277138"
      }
    },
    {
      "endpoint": "rumpel/locations",
      "recordId": "fcf1a26b-e49f-4457-915b-156e14140f38",
      "data": {
        "longitude": "0.100905202634514",
        "latitude": "51.674001392439"
      }
    },
    {
      "endpoint": "rumpel/locations",
      "recordId": "8f7afa92-39e2-48ab-8028-f5aebaa9918e",
      "data": {
        "longitude": "0.080477950927866",
        "latitude": "51.6658257133844"
      }
    },
    {
      "endpoint": "rumpel/locations",
      "recordId": "d3a6f04b-4df6-4888-a7b0-c1d5ca272de9",
      "data": {
        "longitude": "0.0641066288762133",
        "latitude": "51.6641215101037"
      }
    },
    {
      "endpoint": "rumpel/locations",
      "recordId": "6a858d87-899e-4961-b722-0738d07c755e",
      "data": {
        "longitude": "0.0961801595986785",
        "latitude": "51.6712232446779"
      }
    }
  ]
}

To keep the example simple, it does not include complex data combinators covered in the previous step. However you will notice that the endpoints property has exactly the same format as the body of a request for creating a new combinator.

Like Data Combinators, Data Bundles can only be directly used by privileged applications such as the personal data dashboard. However this leads us to Data Debits for consented data sharing as Bundles is the format used to specify the data requested from the user.

Last updated