Welcome to atomlite’s documentation!#
GitHub: https://github.com/lukasturcani/atomlite
atomlite
is a Python library for simple molecular databases on
top of SQLite. It’s goals are as follows:
Read and write molecules to SQLite easily.
Read and write JSON properties associated with molecules easily.
Allow users to interact with the database through SQL commands to fulfil more complex use cases.
In other words, atomlite
should keep simple interactions with the
database simple, while keeping complex things achievable.
For an alternative to atomlite
, which provides stronger integration with
RDKit, and a greater focus on cheminformatics, see chemicalite.
Installation#
pip install atomlite
Getting help#
If you get stuck using atomlite
I encourage you to get in
touch on Discord (invite link) or by creating an issue on GitHub.
I’m happy to help!
Quickstart#
Adding molecules to the database#
First you create a database:
import atomlite
db = atomlite.Database("molecules.db")
Then you make some molecular entries:
import rdkit.Chem as rdkit
entry1 = atomlite.Entry.from_rdkit("first", rdkit.MolFromSmiles("C"))
entry2 = atomlite.Entry.from_rdkit("second", rdkit.MolFromSmiles("CN"))
And add them to the database:
db.add_entries([entry1, entry2])
Finally, you can retrieve the molecules with their keys:
for entry in db.get_entries(["first", "second"]):
molecule = atomlite.json_to_rdkit(entry.molecule)
print(entry.properties)
Tip
We can call Database.get_entries()
with no parameters if
we want to retrieve all the molecules from the database.
See also
Database
: For additional documentation.Database.add_entries()
: For additional documentation.Database.get_entries()
: For additional documentation.Entry.from_rdkit()
: For additional documentation.json_to_rdkit()
: For additional documentation.
Adding molecular properties#
We can add JSON properties to our molecular entries:
entry = atomlite.Entry.from_rdkit(
key="first",
molecule=rdkit.MolFromSmiles("C"),
properties={"is_interesting": False},
)
db.add_entries(entry)
And retrieve them:
for entry in db.get_entries():
print(entry.properties)
{'is_interesting': False}
See also
Database.add_entries()
: For additional documentation.Database.get_entries()
: For additional documentation.Entry.from_rdkit()
: For additional documentation.
Updating molecular properties#
If we want to update molecular properties, we can use
Database.update_properties()
. First, let’s
write our initial entry:
entry = atomlite.Entry.from_rdkit(
key="first",
molecule=rdkit.MolFromSmiles("C"),
properties={"is_interesting": False},
)
db.add_entries(entry)
for entry in db.get_entries():
print(entry)
Entry(key='first', molecule={'atomic_numbers': [6]}, properties={'is_interesting': False})
We can change existing properties and add new ones:
entry = atomlite.PropertyEntry(
key="first",
properties={"is_interesting": True, "new": 20},
)
db.update_properties(entry)
for entry in db.get_entries():
print(entry)
Entry(key='first', molecule={'atomic_numbers': [6]}, properties={'is_interesting': True, 'new': 20})
Or remove properties:
entry = atomlite.PropertyEntry("first", {"new": 20})
db.update_properties(entry, merge_properties=False)
for entry in db.get_entries():
print(entry)
Entry(key='first', molecule={'atomic_numbers': [6]}, properties={'new': 20})
Note
The parameter merge_properties=False
causes the entire property dictionary to
be replaced for the one in the update.
See also
Database.add_entries()
: For additional documentation.Database.get_entries()
: For additional documentation.Database.update_properties()
: For additional documentaiton.Entry.from_rdkit()
: For additional documentation.PropertyEntry
: For additional documentation.
Updating entries#
We can update whole molecular entries in the database. Let’s write our initial entry:
entry = atomlite.Entry.from_rdkit(
key="first",
molecule=rdkit.MolFromSmiles("C"),
properties={"is_interesting": False},
)
db.add_entries(entry)
for entry in db.get_entries():
print(entry)
Entry(key='first', molecule={'atomic_numbers': [6]}, properties={'is_interesting': False})
We can change the molecule:
entry = atomlite.Entry.from_rdkit("first", rdkit.MolFromSmiles("Br"))
db.update_entries(entry)
for entry in db.get_entries():
print(entry)
Entry(key='first', molecule={'atomic_numbers': [35]}, properties={'is_interesting': False})
Change existing properties and add new ones:
entry = atomlite.Entry.from_rdkit(
key="first",
molecule=rdkit.MolFromSmiles("Br"),
properties={"is_interesting": True, "new": 20},
)
db.update_entries(entry)
for entry in db.get_entries():
print(entry)
Entry(key='first', molecule={'atomic_numbers': [35]}, properties={'is_interesting': True, 'new': 20})
Or remove properties:
entry = atomlite.Entry.from_rdkit("first", rdkit.MolFromSmiles("Br"), {"new": 20})
db.update_entries(entry, merge_properties=False)
for entry in db.get_entries():
print(entry)
Entry(key='first', molecule={'atomic_numbers': [35]}, properties={'new': 20})
Note
The parameter merge_properties=False
causes the entire property dictionary to
be replaced for the one in the update.
See also
Database.add_entries()
: For additional documentation.Database.get_entries()
: For additional documentation.Database.update_entries()
: For additional documentaiton.Entry.from_rdkit()
: For additional documentation.
Checking if a value exists in the database#
Sometimes you want to use a database as a cache to avoid recomputations. There is a simple way to do this!
molecule = rdkit.MolFromSmiles("CCC")
num_atoms = db.get_property("first", "$.physical.num_atoms")
if num_atoms is None:
num_atoms = molecule.GetNumAtoms()
db.set_property("first", "$.physical.num_atoms", num_atoms)
print(num_atoms)
3
See also
Database.get_property()
: For additional documentation.Database.set_property()
: For additional documentation.Database.get_bool_property()
: For type-safe access to boolean properties.Database.set_bool_property()
: For type-safe setting of boolean properties.Database.get_int_property()
: For type-safe access to integer properties.Database.set_int_property()
: For type-safe setting of integer properties.Database.get_float_property()
: For type-safe access to float properties.Database.set_float_property()
: For type-safe setting of float properties.Database.get_str_property()
: For type-safe access to string properties.Database.set_str_property()
: For type-safe setting of string properties.
Valid property paths#
Given a property dictionary:
properties = {
"a": {
"b": [1, 2, 3],
"c": 12,
},
}
we can access the various properties with the following paths:
>>> db.get_property("first", "$.a")
{'b': [1, 2, 3], 'c': 12}
>>> db.get_property("first", "$.a.b")
[1, 2, 3]
>>> db.get_property("first", "$.a.b[1]")
2
>>> db.get_property("first", "$.a.c")
12
>>> db.get_property("first", "$.a.does_not_exist") is None
True
A full description of the syntax is provided here.
See also
Database.get_property()
: For additional documentation.
Using an in-memory database#
If you do not wish to write your database to a file, but only keep it in memory, you can do that with:
import atomlite
db = atomlite.Database(":memory:")
Running SQL commands#
Sometimes you want to alter the database by running some SQL commands
directly, for that, you can use the Database.connection
:
import atomlite
import rdkit.Chem as rdkit
db = atomlite.Database("molecules.db")
entry = atomlite.Entry.from_rdkit("first", rdkit.MolFromSmiles("Br"), {"new": 20})
db.add_entries(entry)
for row in db.connection.execute("SELECT * FROM molecules"):
print(row)
('first', '{"atomic_numbers": [35]}', '{"new": 20}')
Usage with Python versions before 3.11#
Sometimes you have an atomlite
database but you can’t use atomlite
because it requires Python 3.11, while the project you’re trying to use your
database with is stuck at an ealier Python version.
Fortunately, we can still interact with the database, for example here we can
add additional properties to a molecule using just sqlite3
:
import sqlite3
import json
db = sqlite3.connect("molecules.db")
db.execute(
"UPDATE molecules "
"SET properties=json_patch(properties,?) "
"WHERE key=?",
(json.dumps({"new": 20}), "first"),
)
db.commit()
We can check that the updates are recognized when using atomlite
:
import atomlite
db = atomlite.Database("molecules.db")
for entry in db.get_entries():
print(entry)
Entry(key='first', molecule={'atomic_numbers': [35]}, properties={'new': 20})