Articles - maahl.net

Summary

2024-07-07

This series of blog post will cover the installation and setup of a self-hosted Grafana-based observability stack. This stack will use OpenTelemetry to generate and collect logs, metrics, and traces, and tools from the Grafana ecosystem to visualize and analyze them. The use of OpenTelemetry allows for shipping logs to a different backend later on with hopefully minimal changes to the things that produce those logs. From the Grafana ecosystem, we will use the "LGTM" stack (Loki, Grafana, Tempo, Mimir), and Grafana Alerting.

Archlinux installation process on Dell XPS 15 7590

Summary

2023-11-27

I am once more installing Archlinux after my SSD fried. I had documented the process last time I did this in a list of bullet points, this article meant to formalize it a bit more.

This is mostly targeted at my future self, so all the software, config, and procedure will be tailored to my needs. Other people might find some value in it though, so I'm putting it out there.

Create a Lambda in Rust using Terraform

Summary

2023-11-05

This blog entry will take you through the steps of creating an AWS Lambda in Rust, and deploying it to AWS via Terraform.

The code showed in this article is hosted on Github. Each section has its own commit, so you can easily follow along with the repo.

I am very much a beginner in Rust, so any feedback or suggestion for improvement is most welcome.

pg_explain_lexer: `EXPLAIN ANALYZE` syntax highlighting

Summary

2018-11-30

For a document I'm in the process of redacting, I have to include lots of EXPLAIN ANALYZE outputs, since I'm discussing query plans. However, for people who are not used to reading explains, making sense of all the info in there can be quite daunting.

To improve the readability of this document, I'm making a custom Pygments lexer, to add some syntaxic coloration to the output of EXPLAIN. It's available on github.

pg_dropbuffers extension

Summary

2018-11-29

When comparing the execution time of a query executed multiple times, cache effects can't come into play, as the first execution (with a cold cache) will obviously be slower than the subsequent executions (with a warm cache), since fewer disk reads will be necessary (that is assuming the dataset doesn't fit in RAM; if it does, then it is better to do your benchmark once all the data is cached).

Therefore, it is needed to drop both the database cache and the system cache between two executions. An easy way to do that is to restart postgres, and to drop the system cache (search for drop_caches in that link). But rjuju and I figured that simply writing SELECT pg_drop_caches() would be more convenient, so we wrote a toy extension, called pg_dropbuffers, that is available here.

PostgreSQL cardinality estimation quality

Summary

2018-06-01

When PostgreSQL optimizes a query, it needs to estimate the number of tuples that each node of the query plan will have to handle: the cardinality of the node. The estimate the optimizer made, and the actual number of tuples, can be seen using EXPLAIN ANALYZE:

EXPLAIN ANALYZE SELECT * FROM orders WHERE netamount > 100;

                                                    QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Seq Scan on orders  (cost=0.00..250.00 rows=8893 width=30) (actual time=0.035..17.633 rows=8897 loops=1)
   Filter: (netamount > '100'::numeric)
   Rows Removed by Filter: 3103
 Planning Time: 0.332 ms
 Execution Time: 21.807 ms
(5 rows)

The example queries are run on the dellstore2 dataset.

As you can read in the rows= parts of the query execution plan, the optimizer estimated that 8893 tuples would match the predicate; actually 8897 did. This number was estimated using the statistics it stored on the column: the process is explained in the docs.

This discrepancy between the estimated and actual cardinality can be measured, for example using the q-error, that is described in this research paper. It is the ratio between actual and estimated cardinalities for an underestimation, and its inverse for an overestimation, so that the q-error is always higher than 1. In the previous example, the q-error was very low (~1), but if the data contains correlations, things can get messy:

PostgreSQL awesome resources

Summary

2018-05-31

PostgreSQL has a very good documentation, but sometimes information about its internals is hard to find, or buried in the mailing lists, or scattered across a number of blogs.

In order to somewhat overcome that, I started to collect various blog articles and presentations about PostgreSQL in a github repository: postgres awesome resources. It's inspired by awesome postgres, but focuses more on understanding how postgres works and how to use it, rather than its ecosystem.