#devops
May 23, 2022

By Showmaxers, for Showmaxers: Is '{JSONB}' a Silver Bullet?

Showmax Engineer
Showmax Engineer
JSONBJSONTablesexperiment
Maximalfocus
By Showmaxers, for Showmaxers: Is '{JSONB}' a Silver Bullet?

We’ve managed to launch a pretty useful series of talks sharing knowledge and teaching skills some of us have and many more could benefit from. The series has been offered internally for several years now, and for more than a year, we’ve been sharing the talks publicly.

Another edition focuses on JSONB and is led by Angus Dippenaar, DevOps engineer.

Contents

The talk offers an explanation of JSON and JSONB, what the difference is between them, what is cool and where their limitations are. You will learn when to use JSONB and when to avoid it. What is interesting is the comparison of JSON, JSONB, and Tables.

We have listed a few points for each of the parts of the talk – check out the marked time within the video to learn more.

What is JSONB

video at 0:33

Imagine JSON but query it. It is revolutionary.
JSON was introduced in Postgres 9.4 and JSONB in Postgres 9.5
JSON is just stored as text, and is converted to a type when the value is read. JSONB is stored in a binary format.
If we compared JSONB to PostgreSQL types, e.g. a string in JSONB will be stored as text in PostgreSQL, a number will be stored as numeric (NaN and infinity values are not allowed), “null” does not translate because NULL in PostgreSQL is a different concept. Learn more in the talk…

JSONB PostgreSQL Notes
string text Remember that it conforms to the database encoding
number numeric NaN and infinity values are not allowed
boolean boolean Only lowercase `true` and `false` spelling is valid
null* Does not translate SQL NULL is a different concept.

*$ SELECT 'null'::jsonb IS NULL, 'null'::jsonb = 'null';

is_null = null
f t

JSONB example

From PostgreSQL documentation/JSON/Data types
– Simple scalar/primitive value
– Primitive values can be numbers, quoted strings, true, false, or null
SELECT ‘5’::json;

– Array of zero or more elements (elements need not be of same type)
SELECT ‘[1, 2, “foo”, null]’::json;

– Object containing pairs of keys and values
– Note that object keys must always be quoted strings
SELECT ‘{“bar”: “baz”, “balance”: 7.77, “active”: false}’::json;

– Arrays and objects can be nested arbitrarily
SELECT ‘{“foo”: [true, “bar”], “tags”: {“a”: 1, “b”: null}}’::json;

JSON vs JSONB

video at 3:46

JSON JSONB
Stores as if it was a TEXT field Stores as a binary format
Doesn’t modify the input string Trims whitespace and re-orders keys for
Preserves whitespace, including new lines more efficient searching
Duplicate keys are preserved Duplicate keys are removed, last value is kept
Only a few operators and functions Many operators
No indexes, no constraints Indexes and constraints
Inserts/updates are fast Inserts/updates a little bit slower
Has to parse and convert types

Some Cool Things

video at 6:22

A few cool things about JSONB are that you can filter the data, index specific keys, index the whole thing (jsonb_path_ops), or add constraints. You can interact with JSONB and the normal table columns within the same query.

Some Limitations

video at 7:01

It is hard to optimize your query as there are no statistics within JSONB or JSON fields, because it’s considered to be just one column.

Some others can be:

The concat operator (||) is only top-level
Max row size of 268 435 455 bytes (268.43 MB)
NaN and infinity are not allowed
null IS NOT NULL
\u0000 is not allowed

Experiment Time

video at 8:03

As for the setup, I installed PostgreSQL 12 on Ubuntu 18.04 and ran Python scripts, each query 12 times, 4 concurrently.

max_connections = 200
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 5242kB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8

Experiment steps

Figure out how Twitter streaming API works.
Save about 4M Tweets as JSON (Avengers and Game of Thrones).
Save these 4M Tweets into a table, one for Avengers, one for Game of Thrones.
Join tweets_avengers and tweets_got into one table, still in JSON.
Convert to JSONB.
Model single table with all 2nd level JSON objects as JSONB fields.
Model the Tweet fully.
Run experiments.

You can see how the experiment went in the video.

JSON vs JSONB vs Tables

video at 13:10

In the real life example sourced by the experiment, you can see a comparison of the data stored in JSON, JSONB, and normal tables. Below is a comparison of table sizes, and a comparison of the time needed for a query, showing which one to use when you need to solve a problem quickly.

Comparison of the table size
Comparison of the query time

Check out the story time at the time 19:40 in the video to think about a possible use case.

When could you use JSONB?

video at 28:30

Here are some examples of when you can use JSONB:
Got NOSQL, but you need some ACID
Storing document data - like settings, API queries (HAR format?)…
Audit trail
Fast-changing data
Data with many optional columns/keys (Could inherited tables work?)
As an aggregate (e.g. storing JSONB into a table)
When you have JSON data and quickly need to do some queries

When not to use JSONB

video at 31:33

Here are some examples of when it could be a bad idea to use JSONB:
When you need to filter the data
When you need indexes
When you need constraints
When you need a key-value store
When you could use a table
When other people need to know what is going on
When you need to store a lot of data in one row

Conclusion

video at 33:49

Programming, the same as all other sorts of engineering, is about compromises. There is no absolute answer, maybe JSONB works for you, maybe not. But we need to try it to see what options we have and set benchmarks.

Share article via: