A Large Database Does Not Mean Large shared_buffers
A co-worker of mine did a blog post last year that I’ve found incredibly useful when assisting clients with getting shared_buffers tuned accurately.
Setting shared_buffers the hard way
You can follow his queries there for using pg_buffercache to find out how your shared_buffers are actually being used. But I had an incident recently that I thought would be interesting to share that shows how shared_buffers may not need to be set nearly as high as you believe it should. Or it can equally show you that you that you definitely need to increase it. Object names have been sanitized to protect the innocent.
IMPORTANT NOTE: An important caveat to using this extension is that when you query pg_buffercache it puts a lock on the shared buffers in order to get an accurate assessment of usage. For the most part, this lock is brief and you probably won’t even notice anything. But on extremely busy systems you may notice some queries take a bit longer than usual. This is not something to be monitored in real-time for any extended period of time. It’s mostly something to be checked on occasionally when you’re noticing performance problems. If you do set up automatic monitoring for it, do not be aggressive.
To set the stage, the database total size is roughly 260GB and the use case is high data ingestion with some reporting done on just the most recent data at the time. shared_buffers is set to 8GB. The other thing to note is that this is the only database in the cluster. pg_buffercache has info for all databases in the cluster, but when you join against pg_class to get object information, you can only do this on individual database at a time.
database=# SELECT c.relname
, pg_size_pretty(count(*) * 8192) as buffered
, round(100.0 * count(*) / ( SELECT setting FROM pg_settings WHERE name='shared_buffers')::integer,1) AS buffers_percent
, round(100.0 * count(*) * 8192 / pg_relation_size(c.oid),1) AS percent_of_relation
FROM pg_class c
INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
WHERE pg_relation_size(c.oid) > 0
GROUP BY c.oid, c.relname
ORDER BY 3 DESC
LIMIT 10;
relname | buffered | buffers_percent | percent_of_relation
-------------------------------------+----------+-----------------+---------------------
table1 | 7479 MB | 91.3 | 9.3
table2 | 362 MB | 4.4 | 100.0
table3 | 311 MB | 3.8 | 0.8
table4 | 21 MB | 0.3 | 100.0
pg_attrdef_adrelid_adnum_index | 16 kB | 0.0 | 100.0
table4 | 152 kB | 0.0 | 7.7
index5 | 16 kB | 0.0 | 14.3
pg_index_indrelid_index | 40 kB | 0.0 | 8.8
pg_depend_depender_index | 56 kB | 0.0 | 1.0
pg_cast_source_target_index | 16 kB | 0.0 | 100.0
You can see that table1 is taking up a vast majority of the space here and it’s a large table, so only 9% of it is actually in shared_buffers. What’s more interesting though is how much of the space for that table is actually in high demand.
database=# SELECT pg_size_pretty(count(*) * 8192)
FROM pg_class c
INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
WHERE c.oid::regclass = 'table1'::regclass
AND usagecount >= 2;
pg_size_pretty
----------------------
2016 kB
Data blocks that go into and come out of postgres all go through shared_buffers. Just to review the blog post I linked to, whenever a block is used in shared memory, it increments a clock-sweep algorithm that ranges from 1-5, 5 being extremely high use data blocks. This means high usage blocks are likely to be kept in shared_buffers (if there’s room) and low usage blocks will get moved out if space for higher usage ones is needed. We believe that a simple insert or update sets a usagecount of 1. So, now we look at the difference when usage count is dropped to that.
database=# SELECT pg_size_pretty(count(*) * 8192)
FROM pg_class c
INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
WHERE c.oid::regclass = 'table1'::regclass
AND usagecount >= 1;
pg_size_pretty
----------------------
4946 MB
So the shared_buffers is actually getting filled mostly by the data ingestion process, but relatively very little of it is of any further use afterwards. If anything of greater importance was needed in shared_buffers, there’s plenty of higher priority space and that inserted data would quickly get flushed out of shared memory due to having a low usagecount.
So with having pg_buffercache installed, we’ve found that the below query seems to be a good estimate on an optimal, minimum shared_buffers setting
database=# SELECT pg_size_pretty(count(*) * 8192) as ideal_shared_buffers
FROM pg_class c
INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
WHERE usagecount >= 3;
ideal_shared_buffers
----------------------
640 MB
This is the sort of query you would run after you have had your database running through your expected workload for a while. Also, note my use of the key word minimal. This does not account for unexpected spikes in shared_buffers usage that may occur during a session of reporting queries or something like that. So you definitely want to set it higher than this, but it can at least show you how effectively postgres is using its shared memory. In general we’ve found the typical suggestion of 8GB to be a great starting point for shared_buffers.
So, in the end, the purpose of this post was to show that shared_buffers is something that needs further investigation to really set optimally and there is a pretty easy method to figuring it out once you know where to look.
UPDATE:
I’ve written a followup post to this one as well: A Small Database Does Not Mean Small shared_buffers.
So, as someone commented below, you don’t really need to join against pg_class & pg_database to get the ideal suggested minimum. This also avoids having to manually do totals across multiple databases in the cluster. The reason for joining against those two was to be able to identify which databases and objects the blocks in shared buffers were associated with. pg_class can only identify the objects of in the database you’re in.
Also, for really high traffic databases with fluctuating query activity, the suggested minimum query isn’t something you can run just once. It has to be run multiple times because the values can vary drastically. Below are the results of running the shorter query just a few times in less than a 1 minute time period on a different client of ours that has a much different traffic pattern (OLTP) than the one above. There’s 46 databases in the cluster with a total size of roughly 900GB, with 800GB in one database, 30GB in the next largest and quickly getting smaller from there. For this one we actually have shared_buffers set down to 4GB and it’s been working great for years.
kfiske@database=# SELECT pg_size_pretty(count(*) * 8192) as ideal_shared_buffers
FROM pg_buffercache b
WHERE usagecount >= 3;
ideal_shared_buffers
----------------------
1431 MB
(1 row)
Time: 259.196 ms
kfiske@database=# SELECT pg_size_pretty(count(*) * 8192) as ideal_shared_buffers
FROM pg_buffercache b
WHERE usagecount >= 3;
ideal_shared_buffers
----------------------
1566 MB
(1 row)
Time: 495.255 ms
kfiske@database=# SELECT pg_size_pretty(count(*) * 8192) as ideal_shared_buffers
FROM pg_buffercache b
WHERE usagecount >= 3;
ideal_shared_buffers
----------------------
1217 MB
(1 row)
Time: 278.755 ms
kfiske@database=# SELECT pg_size_pretty(count(*) * 8192) as ideal_shared_buffers
FROM pg_buffercache b
WHERE usagecount >= 3;
ideal_shared_buffers
----------------------
1092 MB
(1 row)
Time: 260.278 ms
kfiske@database=# SELECT pg_size_pretty(count(*) * 8192) as ideal_shared_buffers
FROM pg_buffercache b
WHERE usagecount >= 3;
ideal_shared_buffers
----------------------
999 MB
(1 row)
Time: 251.809 ms