Michael Ovies dot Com


Postgres Pages Revisited

I'd like to revisit my original write up on pages. I feel, now, that it's kind of dry. And that's a shame because pages are pretty cool. They show up all over the place! And even in other curious places, like the Write Ahead Log, you'll find a similar structure appear. So it is that I think they're quite useful structures to have knowledge of. Let's pop them open a bit more than I did last time!

Now, let's start off with this diagram. First, to be totally transparent, I yoinked it from the official Postgres docs. The biggest reason I did this is because this image is ubiquitous. On all the sites I explored when I first began looking into pages, people were basically copying this image (with no attribution). So, since everyone seems to agree it does the job well, I might as well cut right to the source.

Okay, now to the meat of the post, what is a page? Well, first, let's very briefly unwind things just a bit and step back to tables - or "heaps" as Postgres calls them. It is inside this file that we can begin to explore pages. And, as you might suspect, this is the file which lives on disk and where, traditionally, you'd think of "row data" living. Okay, now, on to pages.

So, inside this heap is an array of pages. Each page is 8 KB large by default. As you insert data into a table, you are actually adding what are called "tuples" into the pages. These tuples are the records of data you're storing, the rows. Now, you might ask yourself, what happens if I store more data in a tuple than 8 KB? That's pretty small, afterall. Well, Postgres adds another page* to the array of pages and adds the tuple there.

What about inside of the pages though? Well, in the image above, you might first notice all of the empty space. That's the freespace. That's where new tuples can be inserted, or not if there's no more room! The PageHeaderData contains general info about the page, such as a checksum, pointer offsets, and the size of the page. The "special" space is strictly for indexes and assists with traversal. In a regular ol' table it's not used. Finally, the items are where tuples are stored. These are stored beginning at the end of the freespace. Conversely, their pointers, begin at the start of the freespace.

And that's it! Well, not really, but the paths of exploration explode in number from this point. For example, see that "special" space in the image above? That's used only for indexes. That means that indexes use this data structure too! Or how about when you run a query with EXPLAIN ANALYZE and you see the "Buffer: shared..." output? Those are blocks, in-memory pages! Understanding pages even helps to conceptualize what happens during a VACUUM and a VACUUM FULL. Or, finally, one of my absolute favorite appearance of pages (yes, I have a favorite), the "lossy" and "exact" data represented by a TIDBitmap.

Let's wrap up with the TIDBitmap. It took me, like, a week of digging through email groups and finally the Postgres source to find it. I was very excited when I found it. Going back to our old friend EXPLAIN ANALYZE, you may have at some point or another encountered output expressing the presence of "lossy" and "exact" blocks. We know what "blocks" are now, but what is lossy and exact? Well, when there is so much data that work_mem is exceeded in the query, some or all of it can become lossy. In contrast to that, "exact" means the query analyzer has a direct reference to the page where the data is. So, then, what exactly is "lossy?" Well, that means the query analyzer knows there is some data in these blocks, but isn't sure how much. It could be one tuple or all of them. In fact, this is where you'll encounter the "Recheck Cond" which means those pages will need to be checked further to find the actual tuples you requested. And the cool thing about the TIDBitmap is that it can represent both at once.


* Of course, new tuples don't always create new pages. There may be some fragmentation present and the tuple can be placed in a page in the middle of the array, or the beginning.


Home | Reach out to devnull @ michaelovies dot com