Use case for Azimutt: Analyzing your database

Scaling things is really hard. What works at a size, won't at be bigger one. It can be easy to keep 10 tables well-designed and consistent. But it's a whole different game for a database with 100 tables. You need different tools and processes.

How to keep a high quality database?

If you know, let us know 😉

Joke aside, it's clear there is no simple answer to this, and it requires a lot of expertise and time. Nevertheless, keeping quality at scale has some different challenges that are harder for a human but simpler for a program. With 100 tables or more, it becomes quite impossible for people not to make an obvious mistake or spotting them all, once you're looking at the database. A program, on the other hand, can easily check whatever we want at scale and help keep consistency and avoid obvious errors.

At Azimutt we believe this could be a great help at improving databases too big for humans. And as ground feedback, we have learned interesting things and found issues to fix on every single database we ran our analysis. So even if it's just an early start and may sound too trivial, I can only encourage you to try it and see how it goes.

Azimutt analysis early attempt

On other topics (design, exploration and documentation), it's easy to explain how Azimutt is doing an amazing job as the purpose is clear, and we think Azimutt answers it pretty well. Here it's very different, finding great heuristics is not trivial. Especially for the general case, but it doesn't prevent us from trying, and you to give feedback and suggest ideas 💪

From experience, scaling things often comes with better quality, more automation and standardization. Of course Azimutt can't judge your database design, but having access to the whole schema (and soon the whole database if you choose to) it can identify some smells you would be wise to check (like a static code analysis tools). Here are the checks we implemented:

  • Missing primary key: it's uncommon not to define a primary key for a table, so if a table don't have one, make sure it's really expected, or fix it.
  • Missing foreign keys: this one is checked using columns name convention (split column name in words and look for a table + column that match several consecutive words). It can be done on purpose but even here, you will want to add the relation in Azimutt.
  • Inconsistent types: if you have several columns with the same name (in different tables), they probably represent the same thing and so have the same type (ex: if you have email columns with varchar(56) and varchar(128) you could want to homogenize them).
  • Too big tables: regardless of your design, tables with many columns (> 30) are probably storing a god object and should be avoided most of the time.

This is a beginning, and we will expand them as we find more smells but also with users feedback. If you think a database analyzer can be useful, tell us what you expect from it and let's see if we can integrate it into Azimutt.

One more thing...

We truly believe that scaling and quality comes from automation most of the time. As Azimutt have a structured representation of your database schema, we thought it could be useful to let you access it, to automate tasks. For that, you only need to open the JavaScript console and use the variable azimutt 🤯

Here is an example extracting all the column types:

const columns = azimutt.getAllTables().flatMap(t => t.columns)
const types = [...new Set(columns.map(c => c.type))].sort()
types

Or a more complex one generating mermaid graph from your current layout:

const tables = azimutt.getVisibleTables()
  .reduce((acc, t) => Object.assign({}, acc, {[`${t.schema}.${t.table}`]: t}), {})
const relations = azimutt.getAllRelations()
  .filter(r => tables[r.src.table] && tables[r.ref.table])
const mermaid = ['graph TD']
  .concat(Object.values(tables).map(t => `    ${t.table}`))
  .concat(relations.map(r => `    ${r.src.table.split('.')[1]} -->|${r.src.column}| ${r.ref.table.split('.')[1]}`))
  .join('\n')
mermaid

Hope this could help you in some task as it already did for us... A more complete article about this will come soon but this was just a small hint to help you manage your schema at scale...