Weekly Focus: Vectors
- What’s Postgres Got To Do With AI?
- Using postgres_scanner2 to read vectors from PG in DuckDB.
A detailed paper from SIGMOD’203 describes how Alibaba designed and built an approximate nearest neighbor search extension for vector similarity. Great deep dive into page structures for ANN indexes, and the code is even available (though, not maintained).
Using vector functions in SingleStore’s SQL, but not clear how well the system scales since the example uses only 7000 vectors.4
An amazing use-case (as far as I’m concerned, data swamps are real) of representing individual columns in the embedding space by utilising pre-trained transformer models. Then, using those vectors to find semantically similar data within your data.5
Learning
Use Apache Iceberg in a data lake to support incremental data processing.
Access Amazon Athena in your applications using the WebSocket API.
Guide to bitwise operators in CrateDB.
Grafana Labs webinars: Reduce MTTR, build beautiful Grafana dashboards, and more.
Anomaly detection on Prometheus metrics.
Deep Dive
A new lecture is out from the CMU Advanced Databases course on Parallel Hash Join Algorithms. If you are into data, you should have a very good reason for not watching this playlist.
Business
Amazing (as always) write-up about using TimescaleDB in the wild, and why compression is crucial for the time-series databases.6
How Wiz used Amazon ElastiCache to improve performance and reduce costs.
How Delivery Hero uses Kubecost and Datadog to manage Kubernetes costs in the cloud.
An emerging buzzword for data platforms capable of both transactions and analytics workloads – “translytical.” A webinar from SingleStore describes precisely such platforms.7