Collections Account Inbound Schema: JSON Schema + OpenSearch Query Patterns

Inbound data is your collections strategy, whether you admit it or not

In collections, most “strategy” problems are really data shape problems.

If the inbound account payload is inconsistent, you end up with:

accounts routed into the wrong treatment
exceptions handled by tribal knowledge
broken worklists
endless rework when arrears and balances do not reconcile

This note is about a pattern that holds up: validate the inbound payload with JSON Schema, then index it into OpenSearch in a way that supports the queries collections teams actually run.

The collections view of an account record

A lending schema often starts from the loan. A collections schema starts from the account in arrears with enough context to decide:

is it pursuable?
what is the exception state?
what is the next best action?
what should this land in (agent worklist, digital journey, hold queue)?

That is why fields like pursuable, monthsInArrears, arrearsStartDate, paymentMethod, and status matter more than most “nice to have” loan metadata.

JSON Schema: validate early, fail predictably

The goal of JSON Schema here is not elegance. It is to create a predictable contract between host and collections.

Minimum rules that usually pay off:

enforce required identity fields
constrain enums for exception status and payment method
validate numeric money fields are not negative
do not allow empty identifiers to creep in

Below is a cleaned up version of the inbound schema, still close to your original structure, but tuned for collections and indexing.

Inbound account schema (collections host payload)

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Collections Account File (Mortgage / Secured Lending)",
  "type": "object",
  "additionalProperties": false,
  "properties": {
   
  },
  "required": ["accountReference", "pursuable"]
}

Two practical additions that help in collections

Strictness on unknown fields

additionalProperties: false prevents silent drift. Without it, your schema becomes a suggestion.

{
  "additionalProperties": false
}

Hard constraints on money fields

At minimum: minimum: 0. In collections, negative pursuable and negative arrears usually indicate a mapping defect or a credit balance scenario that needs explicit handling, not a silent pass.

{
  "minimum": 0
}

Indexing into OpenSearch: design for worklists and segmentation

A collections platform tends to run the same query families repeatedly:

“show me accounts due for action”
“show me exceptions that must be held”
“show me high pursuable accounts in late stage arrears”
“show me Direct Debit failures with recent missed payments”
“show me the queue for a portfolio, then break it down by stage”

That pushes you toward a document model where:

the inbound account record is the primary document
frequently filtered fields are keyword or numeric types
dates are date types
address and free text is either not indexed or mapped carefully

Mapping principles that usually hold up

Map enums as keyword (status, paymentMethod, repaymentType, pillar, brandName) Map money fields as scaled_float (or double if you must, but scaled_float is more stable) Map dates as date Map identifiers as keyword (externalAccountReference, accountNumber) Avoid analysing fields you filter on

Example OpenSearch mapping sketch

{
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 1
    }
  },
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "accountNumber": { "type": "keyword" },

      "pursuable": { "type": "scaled_float", "scaling_factor": 100 },
      "balance": { "type": "scaled_float", "scaling_factor": 100 },
      "arrears": { "type": "scaled_float", "scaling_factor": 100 },
      "score": { "type": "integer" },
      "monthsInArrears": { "type": "integer" },

      "status": { "type": "keyword" },
      "paymentMethod": { "type": "keyword" },
      "repaymentType": { "type": "keyword" },

      "arrearsStartDate": { "type": "date" },
      "lastPaymentDate": { "type": "date" },
      "monthlyInstalment": { "type": "scaled_float", "scaling_factor": 100 }
    }
  }
}

A note on dynamic: “strict”: it is the same mindset as additionalProperties: false in JSON Schema. Drift becomes visible early.

Query patterns that match collections work

These examples assume the inbound record is indexed as one document per account.

Build a daily worklist: pursuable accounts, not in exception status

{
  "query": {
    "bool": {
      "filter": [
        { "range": { "pursuable": { "gt": 0 } } },
        { "range": { "monthsInArrears": { "gte": 1 } } }
      ],
      "must_not": [
        { "terms": { "status": ["Complaint", "Fraud", "Dispute"] } }
      ]
    }
  },
  "sort": [
    { "pursuable": "desc" },
    { "monthsInArrears": "desc" }
  ],
  "size": 50
}

This is the backbone of a queue that stays stable even when upstream is noisy.

Exceptions queue: isolate holds with clear reasons

{
  "query": {
    "bool": {
      "filter": [
        { "terms": { "status": ["Complaint", "Fraud", "Dispute"] } }
      ]
    }
  },
  "sort": [
    { "arrearsStartDate": "asc" }
  ],
  "size": 50
}

Collections teams need this to be boring and predictable.

Where this pattern usually breaks

You index values that are not semantically stable. If pursuable is sometimes “arrears” and sometimes “balance”, your queries are correct and your results are wrong. That is a definition problem, not a technical one.
You allow the host to drift without feedback. If you accept unknown fields and tolerate null identifiers, you will build a platform that can never trust itself.
You treat OpenSearch as the source of truth. OpenSearch should support fast retrieval and worklists. It should not be the system of record.

The practical takeaway

For collections platforms, the inbound account record is not just data. It is the foundation of routing, prioritisation, and compliance.

Validate it with JSON Schema so drift is visible early.
Index it into OpenSearch so worklists and segmentation stay fast and consistent.
Treat exceptions as first-class, because collections always finds the edge cases.