Random codewalks

Comparing sizes of protobuf vs json

update: now comparing gzipped json vs gzipped protobuf

Google Protobuffer is a binary format claiming to much more compact than json and other text-formats, but just how much less space does it require? Does it hold for large arrays of data?

In this blogpost I will compare the sizes of the two formats.

Test-data with array of tickers

I will generate test-data with a home made tool, you can find it on github: https://github.com/nilsmagnus/protobuf-json-xml-size-comparison

The content of the data is defined in a proto-message:

syntax = "proto3";

package sample;

message Test {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
  repeated Ticker tickers = 4;
}

message Ticker {
  string name = 1;
  float value = 2;
}

There is 1 string, 2 ints and an array of possibly unlimited array of tickers.

The ticker-name is a random string of size 3 and the value is a random float value between 0.0 and 9.99. In json could looks like this with 2 tickers in the array:

{
  "query": "myQuery",
  "page_number": 42,
  "result_per_page": 100,
  "tickers": [
    {
      "name": "rPs",
      "value": 9.768923
    },
    {
      "name": "WEo",
      "value": 6.067048
    }
  ]
}

The numbers

Comparing the raw-size only of json would be unfair, since it is usually gzipped before transfer between client/servers.

However, take into account that when gzipping content you are using additional cpu to zip the content which is not needed when using protobuf.

The size-numbers are in bytes.

Raw json

no of tickers size raw json size protobuf protobuf size(%)
0 58 13 22.4
1 102 25 24.5
2 133 37 27.8
10 396 133 33.6
20 724 253 34.9
200 6578 2413 36.7
2000 65250 24013 36.8

Protobuf is clearly the winner for all sizes of the ticker-list, but is best when the ticker-list is smaller.

gzipped json and gzipped protobuf

The common case for transferring json-messages is to gzip them first, so lets see how that affects our numbers.

no of tickers size gzipped json gzipped protobuf gzipped protobuf size(%)
0 82 42 51.21
1 125 54 43.20
2 142 64 45.07
10 235 137 58.29
20 331 230 69.48
200 1970 1629 82.69
2000 17539 14808 84.42
20000 171154 146378 85.52

Protobuf clearly wins on the smaller map-sizes, but loses its clear advantage when the ticker-list grows in size.

Protobuf is still the winner on all sizes.

Conclusion

Protocol buffers is a clear winner for small messages where the protobuf size is as small as 16% of the gzipped json size.

However, when large arrays of data is transferred, gzipped protobuf is still smaller but seems to lose its clear advantage in message size to gzipped json.

Read more

Read more about protobuf in the documentation: https://developers.google.com/protocol-buffers/