Archlinux installation process on Dell XPS 15 7590

I am once more installing Archlinux after my SSD fried. I had documented the process last time I did this in a list of bullet points, this article meant to formalize it a bit more.

This is mostly targeted at my future self, so all the software, config, and procedure will be tailored to my needs. Other people might find some value in it though, so I'm putting it out there.

Continue Reading
Rust sudoku solvers part 0: intro

This series of blog post will cover the development of a (bunch of) sudoku solver(s) in Rust, using a wasm library and a frontend framework in Rust.

This post is very barebones for now, just pushing it out so I can work on implementing the projects section of my website, that is generated with Zola and requires at least one page in a given taxonomy to generate the taxonomy's pages.

Continue Reading
Create a Lambda in Rust using Terraform

This blog entry will take you through the steps of creating an AWS Lambda in Rust, and deploying it to AWS via Terraform.

The code showed in this article is hosted on Github. Each section has its own commit, so you can easily follow along with the repo.

I am very much a beginner in Rust, so any feedback or suggestion for improvement is most welcome.

Continue Reading
pg_explain_lexer: `EXPLAIN ANALYZE` syntax highlighting

For a document I'm in the process of redacting, I have to include lots of EXPLAIN ANALYZE outputs, since I'm discussing query plans. However, for people who are not used to reading explains, making sense of all the info in there can be quite daunting.

To improve the readability of this document, I'm making a custom Pygments lexer, to add some syntaxic coloration to the output of EXPLAIN. It's available on github.

Continue Reading
pg_dropbuffers extension

When comparing the execution time of a query executed multiple times, cache effects can't come into play, as the first execution (with a cold cache) will obviously be slower than the subsequent executions (with a warm cache), since fewer disk reads will be necessary (that is assuming the dataset doesn't fit in RAM; if it does, then it is better to do your benchmark once all the data is cached).

Therefore, it is needed to drop both the database cache and the system cache between two executions. An easy way to do that is to restart postgres, and to drop the system cache (search for drop_caches in that link). But rjuju and I figured that simply writing SELECT pg_drop_caches() would be more convenient, so we wrote a toy extension, called pg_dropbuffers, that is available here.

Continue Reading
PostgreSQL cardinality estimation quality

When PostgreSQL optimizes a query, it needs to estimate the number of tuples that each node of the query plan will have to handle: the cardinality of the node. The estimate the optimizer made, and the actual number of tuples, can be seen using EXPLAIN ANALYZE:

EXPLAIN ANALYZE SELECT * FROM orders WHERE netamount > 100;
                                                    QUERY PLAN
 Seq Scan on orders  (cost=0.00..250.00 rows=8893 width=30) (actual time=0.035..17.633 rows=8897 loops=1)
   Filter: (netamount > '100'::numeric)
   Rows Removed by Filter: 3103
 Planning Time: 0.332 ms
 Execution Time: 21.807 ms
(5 rows)

The example queries are run on the dellstore2 dataset.

As you can read in the rows= parts of the query execution plan, the optimizer estimated that 8893 tuples would match the predicate; actually 8897 did. This number was estimated using the statistics it stored on the column: the process is explained in the docs.

This discrepancy between the estimated and actual cardinality can be measured, for example using the q-error, that is described in this research paper. It is the ratio between actual and estimated cardinalities for an underestimation, and its inverse for an overestimation, so that the q-error is always higher than 1. In the previous example, the q-error was very low (~1), but if the data contains correlations, things can get messy:

Continue Reading
PostgreSQL awesome resources

PostgreSQL has a very good documentation, but sometimes information about its internals is hard to find, or buried in the mailing lists, or scattered across a number of blogs.

In order to somewhat overcome that, I started to collect various blog articles and presentations about PostgreSQL in a github repository: postgres awesome resources. It's inspired by awesome postgres, but focuses more on understanding how postgres works and how to use it, rather than its ecosystem.

Continue Reading