JSV Validator Form

At the simplest level, there are three classes:

content-item.json, which holds 'raw' data and its transformed versions, such as 'cleansed' and 'ner'.
http-metadata.json, which holds metadata from retrieving a document from a web server.
corpus-item.json, which has a content-item called 'body', and has 'source_metadata' that could be an instance of http-metadata.json.

The TREC KBA stream corpus consists of instances of stream-item.json, which extends corpus-item.json with stream_time and stream_id to give the corpus a temporal ordering.

The TREC KBA stream corpus contains three subcorpora, which have distinct 'source_metadata':

news-metadata.json extends http-metadata.json with 'language'.
linking-metadata.json extends http-metadata.json with 'queries' and 'shorten_events'.
social-metadata.json is not an extension of http-metadata.json. It contains rich metadata generated by the social media feed aggregator.

JSON Schema

JSON Instance