Django is famous for its simplicity, flexibility, and robust features that made it a “batteries-included” framework offering comprehensive tools for web development, database operations, URL routing, HTML templating, and security. In this post and later posts (should time allow), I’ll delve into the best practices for developing Django applications, and particularly performant and easy to maintain APIs.
Let’s start with Django’s ORM, which serves as cornerstone of any application. As a developer, often, we hastily code the models without considering the long-term consequences. Our rushed decisions usually lead to future complications or even data corruption. So, it’s crucial to carefully plan and design Django models to ensure a robust foundation for your application.
This post covers some best practice advices in these topics :
- Optimal number of models in Django apps
- Django model inheritance
- Denormalization in django model
- Django Custom managers
- Empty values in Django char based fields
- Django model migrations consolidation
- Unique uuid identifier in Django model
- Django ORM advanced query tools
- Django model pk property for primary key fields
- Django model naming
- Update existing record in DB using Model.save()
1. Keep Number Of Models In Apps No MoreThan 10
If you’ve got 20 models in a single app , it’s time to slice and dice the app into smaller ones. To keep things manageable and avoid overloading, keep number of models no more than ten models per app. Using bounded context principles in Domain-Driven Design (DDD) can help in organizing separate apps and grouping related models.
2. Don’t Use Multi Table Inheritance
It’s a good practice to have base model. Typically, fields such as ‘created_at’ and ‘updated_at’ are ideal for go into the BaseModel. But, you should avoid using multi-table inheritance, because it leads to confusion and significant overhead. For each query on child it equires to join on parent model. Instead, opt for OneToOneFields and ForeignKeys. Multi-table inheritance does nothing but making troubles. In other word, in Django, subclassing creates new tables and involves numerous left joins, which can hamper performance, especially in high-demand environments like game backends. To avoid this, handle inheritance manually using techniques like null, OneToOne, or Foreign key.
Bad practice:
from django.db import models
# Create your models here.
class Employee(models.Model):
name= models.CharField(max_length=100)
class RegularEmployee(Employee):
salary= models.IntegerField()
bonus= models.IntegerField()
3. Denormalization Should be The Last Solution
We often see the denormalization as a first solution for challenges in our projects, unaware that it can cause complexity and increase the risk of data loss. It’s strongly recommended to consider other techniques such as caching before denormalization. Consider denormalization only when other techniques didn’t meet your needs.
4. Use Custom Managers For Custom DB Queries
Django model manager provides a convenient mechanism for encapsulating complex query logic related to a model. Try to place frequently used operations associated with a particular model in the custom manager to have reusable code as well as improved code organization and readability. In the following example the get_published_posts()
method filters blog posts with a status of “published”. We can use this method wherever we need to retrieve published posts, keeping our code DRY.
from django.db import models
class PostManager(models.Manager):
def get_published_posts(self):
"""Retrieve all published blog posts."""
return self.filter(status='published')
class BlogPost(models.Model):
title = models.CharField(max_length=200)
content = models.TextField()
status = models.CharField(max_length=10, choices=[('draft', 'Draft'), ('published', 'Published')])
objects = PostManager()
# Usage:
published_posts = BlogPost.objects.get_published_posts()
The Manager method should be dedicated to database-related tasks. Take, for example, the Manager method below, which filters product records based on their numeric serial number using their display serial number that starts with ‘SN’:
class DeviceManager(models.Manager):
def get_by_display_serial_number(display_serial_number:str):
numeric_serial_number = int(display_serial_number[2:])
products = Products.objects.filter(serial_number= numeric_serial_number)
return products
This approach falls short in terms of code clarity and single responsibility principle. It’s not a specialized database query or a complex operation; rather, it merely involves manipulating the input parameter, which is none of Manager method’s business. A more effective and cleaner approach would be creating a simple helper method to convert the display serial number to the numeric serial number, followed by querying it using Django’s model filter:
def get_numeric_serial_number(display_serial_number:str) -> int:
return int(display_serial_number[2:])
def some_service_or_view():
...
numeric_serial_number = get_numeric_serial_number(display_serial_number)
products = Product.objects.filter(serial_number=numeric_serial_number)
By segregating the conversion logic into a separate function, the code becomes more modular and comprehensible. This way, the purpose of each function is distinct, adhering to the single responsibility principle, that improves maintainability and readability.
5. Avoid null=True For Char Fields
It is generally discouraged to allow NULL values in string-based fields. Allowing null values in such fields results in two possible representations for absence of data: NULL or an empty string. To maintain clarity and consistency, Django conventionally favors the use of an empty string to represent absence of data.
However, there is an exception to this guideline when the field is declared with both unique=True and blank=True. In such cases, an empty string would not be considered unique.
String-based fields including:
- CharField
- TextField
- EmailField
- URLField
- SlugField
- FilePathField
- URLField (Repeated mention)
Bad practice:
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=150, null=True, blank=True)
Best Practice:
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=150, blank=True)
6. Consolidate Migrations for Cleaner Release
We may have more than on changes in models in a release. Consolidating all migrations into a single file for each app in each release simplifies the management and deployment of database schema changes. This approach bring a cleaner release process and reduces the risk of migration related issues. You can use Django squashmigrations to bring specific generated migrations to heel:
python manage.py squashmigrations <appname> <squashfrom> <squashto>
7. Use Two Unique Identifiers For Your Models :
In real world project, use two identifiers for records in a Django model, a private identifier, often the primary key (id
), and a public ID, represented by a UUID (UID). This approach offers both security and convenience, as it prevents revealing sensitive information about the data while still allowing for unique identification of records. Maintaining enumerators as private is always advisable as they reveal sensitive information about our data, such as the number of records ( e.g. products or accounts) we have, which we prefer to keep confidential:
import uuid
from django.db import models
class YourModel(models.Model):
id = models.AutoField(primary_key=True)
uid = models.UUIDField(default=uuid.uuid4, editable=False, unique=True)
name = models.CharField(max_length=100)
description = models.TextField()
def __str__(self):
return f'{self.name} - {self.uid}'
Don’t use UUID as primary key. The issue with UUID as primary key is the inefficiency in inserts due to the non-sequential nature of UUIDs. Because the primary key is often a clustered index by default in majority of databases. It means, databases need to resort physical storage when inserting new ID with lower ordinality. With UUID it will happen almost all the time. This can lead to significant delays in inserting new records, taking seconds or even minutes as the database grows.
8. Use Advanced Query Tools Instead Of Processing Records In Python
Instead of working with data in Python, let Django’s advanced query tools handle this task for you. By doing this, we can not only enhance performance but also achieve cleaner and more maintainable code. Let’s elaborate this concept with an example. Assume we need to have a list of all students whose Math scores are more than their English scores. Without Django ORM query expressions, it can be done with following code :
from models.students import Student
students = []
for student in Student.objects.iterator():
if student.math_score > student.english_score:
students.append(student)
The above code iterating through every Student record in the database using Python, one by one. It’s slow, memory consuming, and potentially leading to race conditions. Race conditions may arise when the script is executed concurrently with other user interactions with the same data. A more efficient and race-condition free approach is to use Django query expressions:
from django.db.models import F
from models.students import Student
students = Student.objects.filter(math_score__gt=F('english_score'))
This way, we leverage the database itself to perform the comparison, enhancing project performance and stability.
9. Use pk instead of id
In Django, the id
field is the default primary key, automatically generating a unique integer for each database record. The pk
property, however, refers to the model’s designated primary key field, whether it’s id
, student_id
, or something else. Using pk
throughout your code offers flexibility, allowing you to change the primary key field without modifying your code. This makes your code more readable, self-explanatory, and consistent, regardless of the primary key’s name.
Example 1 – Using id:
student = Student.objects.get(id=42)
print(student.id)
Example 2 – Using pk:
student = Student.objects.get(pk=42)
print(student.pk)
I compared the performance of pk
against id
in a queryset filtering and retrieval operation with 100,000 sample records. The operation time increased by only 8% in 100,000 retrieve operations, a small trade-off for the benefits of readability and consistency.
10. Django Model Naming
- Model names should use singular nouns to represent individual entities. This helps clarify the relationships between models and minimizes potential confusion.
- Django models use CamelCase, a Python-based convention where each word in the name starts with an uppercase letter and no underscores are included.
- A ManyToManyField should be named with a plural noun that reflects the associated model. For example, if an Author model has a ManyToManyField linked to a Book model, the field might be called “books.” A OneToOneField should use a singular noun that mirrors the related model, showing a one-to-one relationship. For instance, a User model could have a one-to-one link with a Profile model, with the field named “profile.”
11. Using update_fields
with save()
To update specific columns in a database record, you can use the update_fields
parameter when calling the save()
method. This approach allows you to specify precisely which fields should be updated.
product = Product.objects.get(id=1)
product.name = "new product name"
product.save(update_fields=['name'])
The resulting SQL query will be:
UPDATE "product"
SET "name" = 'new product name'
WHERE "product"."id" = 1
You can also update multiple fields at once by including additional field names in the update_fields
list.
Using this method is more efficient because it limits the database operation to only the specified fields, reducing unnecessary overhead and improving performance. Another key reason to use update_fields
is to prevent data conflicts during concurrent updates. Consider a scenario with the Product
model: if one user sets the is_deleted
flag to True
while another user changes the product’s name, and these actions happen in separate processes, using the generic save()
method can lead to issues. The second process might unintentionally overwrite the is_deleted
value back to False
because it retains the outdated value in memory.
While projects with high levels of concurrent data modification may require more robust conflict management strategies, using update_fields
to update only the necessary fields significantly reduces the risk of unintended side effects.
Finally, although these best practices offer pragmatic and tested solutions and focus on re-usability, readability and reliability of code in Django development, always there are alternative approaches for different scenarios and different situations that fits your requirement.
Happy coding!✌
References :
Could you please write best practice for API development using Django Rest Framework?