When the popularity of JSON grew amongst developers they started to face a problem. The issue was how to make sure the objects that are serialised in this new text format in one application can be then properly de-serialised in any other application. The different software solutions can use very different programming tools and languages thus the parser implementations and data transformations can vary significantly between these. The problem is not new and it has been apparent with all text-based data formats (for example the XML). The solution with XML was to introduce a special superset of XML tags that have standardised meaning and can describe what a valid XML file should look like. It is called the XML schema (or XSD).
The same approach was introduced for JSON with the JSON-schema specification (www.json-schema.org). The JSON schema is something called a meta model of the text format. By using the same notation, it describes the structure of the exchanged object’s data and the restrictions imposed over the attribute’s values (strings, date-time formats, numeric formats, and ranges). A structured text such as JSON is considered “valid” only when it fully complies to the rules defined in the schema.
Meta-model and data
The convenience to use the same text markup notation for the schema as for the JSON text model makes it possible to use the same software parsing tools and libraries to handle both.
Here is an example of how a JSON schema looks like:
"description": "The person's first name."
"description": "The person's last name."
"description": "Age in years which must be equal to or greater than zero.",
And here is the actual exchanged data:
It is evident that the schema is more descriptive than the actual JSON data, but the purpose of the schema is to give the proper structure and semantical meaning of that data. It is meant to be an instruction manual for the software applications on how to interpret the actual JSON data exchange.
In order for the schema to work properly, it has to be subjected to a specification and this specification needs to be adopted and accepted from all software developers that would use the JSON-schema as a validation rulebook. The goal of such a specification is to be as strict as it can — any interpretational ambiguity will compromise the integrity of JSON-based communications. Such issues have been already experienced with the variety of web-browsers available — sometimes the same website might look differently only because the different browsers differ in their interpretation of the HTML and CSS.
Validation of the exchanged data — “the why”
With object-oriented programming (OPP) software tools and development environments — it is the responsibility of the compiler (or interpreter) to validate the attribute values, lists (collections) and name-value structures. An object in an OOP language has first to be defined as a type, that is a template of data members and execution code. Each data member of that object can hold a simple value (text, number, date, etc…), another object (or reference to it) and/or lists of simple values (or objects).
That object skeleton is described by the software developer and it is used by the code compiler to enforce the validation when the object’s data is being accessed or modified. When objects are passed as parameters to functions — the process actually resembles the sending of object data (in JSON format for example) — the whole process is conveniently encapsulated and hidden even from the software programmer.
If object’s data needs to be sent to another application though, and especially if the application is running on another computer so the object data has to travel across a network — the process of object’s data validation is out of the scope of the compiler. The applications that exchange the data might not even be written in the same programming language. Such communication emphasises the importance of the object’s data verification when data is serialised to JSON, sent over a network and then de-serialised from JSON back into memory as an object.
The JSON-schema is the specification that a JSON-validation software code (utility or library) would strictly follow during this process. Currently, almost all cloud-based software development platforms provide such tooling to a varying degree of complexity. The schema is the data reticle to filter-out improper object attributes, their values, and incorrect object structures. When the schema is properly defined all errors during object data exchange can be quickly identified and tackled.
The schema itself can be even synthesised from the object’s type as defined in the OOP programming language. Some JSON schema tools allow for special code comments in the code where the object’s type is defined to be used as instructions on how the schema is to be constructed. There is actually very little effort to be done from a software programmer’s perspective to actually produce the JSON-schema files and the benefit later is way more noticeable than the work done upfront to keep schemas well documented.
Validation of messages — “the where”
There are 3 typical use cases when schema-based validation comes in very handy.
- Our software application can fetch and update JSON object data to/from databases (free-document database) and that data can be added/modified in the same database by a different set of applications (or third-party tools). This scenario requires our application to strictly verify the JSON content it is retrieving from the databases before using it to de-serialise objects in its memory. Such an approach would catch runtime errors that could not be foreseen during the time of development (the other applications that modify the same database might have been re-worked or updated).
- Our software application can read (and persist) JSON object data to be used as configuration files. Being well defined and in a structured text format, JSON allows for exchanging (or generating) application configuration files (settings) instead of using other less strict formats (YAML, name-value lines, INI-file, etc.). The advantage of JSON is that configuration can easily be validated with the JSON schema and thus the application would not allow for wrong configuration values to be used. Additionally JSON is very database-friendly if we want to keep the configuration in a document-type database.
- Our application can accept API requests with JSON payload over the HTTP protocol (or if our micro-service application architecture allows for inter-process communication on our own servers). When JSON is used for API request payloads it is imperative to apply content validation before the request is handled. Acting as the first line-of-defence in terms of proper object data, the schema filters-out all bad/improper API requests before those even enter some execution queue. Also — when the JSON schema is exported and published on a web-portal — all software developers can write consumer applications (cloud-services, mobile apps, web portals) to send only correct API JSON payloads (or at least pre-test those before trying to communicate with the API-provider service).
Tools and software libraries
To support a strong specification effort — lots of tools and libraries have been developed for all major software programming environments. These usually fall under some of the following categories:
Schema implementation validators — to verify that the software code which enforces the JSON schema validation is working without flaws according to the JSON schema specification.
Hyper-schema tools — to generate API documentation from a set of many related schemas (where object types in one schema can refer to other object types in the separate schema).
Generators of schema — to synthesise the JSON schema from the actual source code that describes an object type (or from a database: relational- or document-based).
Generators from schemas — to create skeleton- object type software code if we have an existing JSON schema, or produce a database schema-set.
Common tools — to transform (separate, unify) schemas, convert JSON values formatting, pre-production JSON testing, editing schemas.
The growing impact of the JSON format for web-based and cloud-based software is the reason schema validation, specification and the accompanying tools/libraries have to become an integral part of the software development, testing, and deployment process.
eCollect’s IT team has adopted the JSON schema as an essential tool to produce our API documentation (https://ecollect.org/docs/v2/api/). The documentation is online, always up-to-date and part of our integration service. The validation of all incoming/outgoing API exchange is fully automated with the digital services we provide. The easy-to-explore set of entry points, object types and their relations allows for straightforward software integration and full digital automation with all our customers.
Author: Nikolay Belichki, Technology Engineer