Welcome to atomlite’s documentation!

Tip

⭐ Star us on GitHub! ⭐

GitHub: https://github.com/lukasturcani/atomlite

atomlite is a Python library for simple molecular databases on top of SQLite. It’s goals are as follows:

  1. Read and write molecules to SQLite easily.

  2. Read and write JSON properties associated with molecules easily.

  3. Allow users to interact with the database through SQL commands to fulfil more complex use cases.

In other words, atomlite should keep simple interactions with the database simple, while keeping complex things achievable.

For an alternative to atomlite, which provides stronger integration with RDKit, and a greater focus on cheminformatics, see chemicalite.

Installation

pip install atomlite

Getting help

If you get stuck using atomlite I encourage you to get in touch on Discord (invite link) or by creating an issue on GitHub. I’m happy to help!

Quickstart

Adding molecules to the database

First you create a database:

import atomlite
db = atomlite.Database("molecules.db")

Then you make some molecular entries:

import rdkit.Chem as rdkit
entry1 = atomlite.Entry.from_rdkit("first", rdkit.MolFromSmiles("C"))
entry2 = atomlite.Entry.from_rdkit("second", rdkit.MolFromSmiles("CN"))

And add them to the database:

db.add_entries([entry1, entry2])

Finally, you can retrieve the molecules with their keys:

for entry in db.get_entries(["first", "second"]):
    molecule = atomlite.json_to_rdkit(entry.molecule)
    print(entry.properties)

Tip

We can call Database.get_entries() with no parameters if we want to retrieve all the molecules from the database.

See also

Adding molecular properties

We can add JSON properties to our molecular entries:

entry = atomlite.Entry.from_rdkit(
    key="first",
    molecule=rdkit.MolFromSmiles("C"),
    properties={"is_interesting": False},
)
db.add_entries(entry)

And retrieve them:

for entry in db.get_entries():
    print(entry.properties)
{'is_interesting': False}

See also

Retrieving molecular properties as a DataFrame

We can retrieve the properties of molecules as a DataFrame:

db.add_entries(
    [
        atomlite.Entry.from_rdkit(
            key="first",
            molecule=rdkit.MolFromSmiles("C"),
            properties={"num_atoms": 1, "is_interesting": False},
        ),
        atomlite.Entry.from_rdkit(
            key="second",
            molecule=rdkit.MolFromSmiles("CN"),
            properties={"num_atoms": 2, "is_interesting": True},
        ),
    ]
)
print(db.get_property_df(["$.num_atoms", "$.is_interesting"]))
shape: (2, 3)
┌────────┬─────────────┬──────────────────┐
│ key    ┆ $.num_atoms ┆ $.is_interesting │
│ ---    ┆ ---         ┆ ---              │
│ str    ┆ i64         ┆ bool             │
╞════════╪═════════════╪══════════════════╡
│ first  ┆ 1           ┆ false            │
│ second ┆ 2           ┆ true             │
└────────┴─────────────┴──────────────────┘

See also

Updating molecular properties

If we want to update molecular properties, we can use Database.update_properties(). First, let’s write our initial entry:

entry = atomlite.Entry.from_rdkit(
    key="first",
    molecule=rdkit.MolFromSmiles("C"),
    properties={"is_interesting": False},
)
db.add_entries(entry)
for entry in db.get_entries():
    print(entry)
Entry(key='first', molecule={'atomic_numbers': [6]}, properties={'is_interesting': False})

We can change existing properties and add new ones:

entry = atomlite.PropertyEntry(
    key="first",
    properties={"is_interesting": True, "new": 20},
)
db.update_properties(entry)
for entry in db.get_entries():
    print(entry)
Entry(key='first', molecule={'atomic_numbers': [6]}, properties={'is_interesting': True, 'new': 20})

Or remove properties:

entry = atomlite.PropertyEntry("first", {"new": 20})
db.update_properties(entry, merge_properties=False)
for entry in db.get_entries():
    print(entry)
Entry(key='first', molecule={'atomic_numbers': [6]}, properties={'new': 20})

Note

The parameter merge_properties=False causes the entire property dictionary to be replaced for the one in the update.

See also

Updating entries

We can update whole molecular entries in the database. Let’s write our initial entry:

entry = atomlite.Entry.from_rdkit(
    key="first",
    molecule=rdkit.MolFromSmiles("C"),
    properties={"is_interesting": False},
)
db.add_entries(entry)
for entry in db.get_entries():
    print(entry)
Entry(key='first', molecule={'atomic_numbers': [6]}, properties={'is_interesting': False})

We can change the molecule:

entry = atomlite.Entry.from_rdkit("first", rdkit.MolFromSmiles("Br"))
db.update_entries(entry)
for entry in db.get_entries():
    print(entry)
Entry(key='first', molecule={'atomic_numbers': [35]}, properties={'is_interesting': False})

Change existing properties and add new ones:

entry = atomlite.Entry.from_rdkit(
    key="first",
    molecule=rdkit.MolFromSmiles("Br"),
    properties={"is_interesting": True, "new": 20},
)
db.update_entries(entry)
for entry in db.get_entries():
    print(entry)
Entry(key='first', molecule={'atomic_numbers': [35]}, properties={'is_interesting': True, 'new': 20})

Or remove properties:

entry = atomlite.Entry.from_rdkit("first", rdkit.MolFromSmiles("Br"), {"new": 20})
db.update_entries(entry, merge_properties=False)
for entry in db.get_entries():
    print(entry)
Entry(key='first', molecule={'atomic_numbers': [35]}, properties={'new': 20})

Note

The parameter merge_properties=False causes the entire property dictionary to be replaced for the one in the update.

See also

Checking if a value exists in the database

Sometimes you want to use a database as a cache to avoid recomputations. There is a simple way to do this!

molecule = rdkit.MolFromSmiles("CCC")
num_atoms = db.get_property("first", "$.physical.num_atoms")
if num_atoms is None:
    num_atoms = molecule.GetNumAtoms()
    db.set_property("first", "$.physical.num_atoms", num_atoms)
print(num_atoms)
3

See also

Valid property paths

Given a property dictionary:

properties = {
    "a": {
        "b": [1, 2, 3],
        "c": 12,
    },
}

we can access the various properties with the following paths:

>>> db.get_property("first", "$.a")
{'b': [1, 2, 3], 'c': 12}
>>> db.get_property("first", "$.a.b")
[1, 2, 3]
>>> db.get_property("first", "$.a.b[1]")
2
>>> db.get_property("first", "$.a.c")
12
>>> db.get_property("first", "$.a.does_not_exist") is None
True

A full description of the syntax is provided here.

See also

Using an in-memory database

If you do not wish to write your database to a file, but only keep it in memory, you can do that with:

import atomlite
db = atomlite.Database(":memory:")

Running SQL commands

Sometimes you want to alter the database by running some SQL commands directly, for that, you can use the Database.connection:

import atomlite
import rdkit.Chem as rdkit
db = atomlite.Database("molecules.db")
entry = atomlite.Entry.from_rdkit("first", rdkit.MolFromSmiles("Br"), {"new": 20})
db.add_entries(entry)
for row in db.connection.execute("SELECT * FROM molecules"):
    print(row)
('first', '{"atomic_numbers": [35]}', '{"new": 20}')

Usage with Python versions before 3.11

Sometimes you have an atomlite database but you can’t use atomlite because it requires Python 3.11, while the project you’re trying to use your database with is stuck at an ealier Python version.

Fortunately, we can still interact with the database, for example here we can add additional properties to a molecule using just sqlite3:

import sqlite3
import json
db = sqlite3.connect("molecules.db")
db.execute(
    "UPDATE molecules "
    "SET properties=json_patch(properties,?) "
    "WHERE key=?",
    (json.dumps({"new": 20}), "first"),
)
db.commit()

We can check that the updates are recognized when using atomlite:

import atomlite
db = atomlite.Database("molecules.db")
for entry in db.get_entries():
    print(entry)
Entry(key='first', molecule={'atomic_numbers': [35]}, properties={'new': 20})

Indices and tables