The Basics
from dataclasses import dataclass, field
from typing import Optional
from datetime import datetime
@dataclass
class User:
id: int
name: str
email: str
role: str = "user"
active: bool = True
created_at: datetime = field(default_factory=datetime.utcnow)
tags: list[str] = field(default_factory=list)
# Auto-generated __init__:
alice = User(id=1, name="Alice Chen", email="alice@example.com", role="admin")
print(alice)
# User(id=1, name='Alice Chen', email='alice@example.com', role='admin', active=True, ...)The decorator auto-generates __init__, __repr__, and __eq__. Fields with defaults must come after fields without defaults — the same rule as regular function parameters.
Never write tags: list = [] in a dataclass. Python creates one list shared across all instances. Always use field(default_factory=list) to create a fresh list per instance.
field() — Fine-Grained Control
from dataclasses import dataclass, field
@dataclass
class Order:
order_id: str
items: list[str] = field(default_factory=list)
# exclude from __repr__ (e.g. sensitive data)
_internal_token: str = field(default="", repr=False)
# exclude from __init__ (computed field)
item_count: int = field(init=False, repr=True)
# exclude from __eq__ comparison
timestamp: float = field(default=0.0, compare=False)
def __post_init__(self):
self.item_count = len(self.items)__post_init__: Computed Fields and Validation
__post_init__ runs after the auto-generated __init__ — the right place for derived fields, validation, or type coercion:
from dataclasses import dataclass
from decimal import Decimal
@dataclass
class Money:
amount: float
currency: str = "INR"
def __post_init__(self):
if self.amount < 0:
raise ValueError(f"Amount cannot be negative: {self.amount}")
# Normalise to 2 decimal places
self.amount = round(self.amount, 2)
self.currency = self.currency.upper()
price = Money(amount=1299.999)
print(price) # Money(amount=1300.0, currency='INR')Frozen Dataclasses: Immutable Value Objects
Set frozen=True to make all fields read-only after creation. The object also becomes hashable, allowing it to be used as a dictionary key or in a set:
from dataclasses import dataclass
@dataclass(frozen=True)
class Coordinate:
latitude: float
longitude: float
def distance_to(self, other: "Coordinate") -> float:
# Haversine formula (simplified)
dlat = abs(self.latitude - other.latitude)
dlon = abs(self.longitude - other.longitude)
return (dlat**2 + dlon**2) ** 0.5
home = Coordinate(latitude=28.6139, longitude=77.2090) # Delhi
office = Coordinate(latitude=19.0760, longitude=72.8777) # Mumbai
# home.latitude = 0 # raises FrozenInstanceError
# Works as dict key because it's hashable:
distances = {home: 0.0, office: home.distance_to(office)}Inheritance
from dataclasses import dataclass
@dataclass
class Animal:
name: str
sound: str
@dataclass
class Dog(Animal):
breed: str
trained: bool = False
# Inherits name and sound; adds breed and trained
rex = Dog(name="Rex", sound="Woof", breed="German Shepherd", trained=True)
print(rex)
# Dog(name='Rex', sound='Woof', breed='German Shepherd', trained=True)If the parent class has fields with defaults and the child class adds fields without defaults, Python raises a TypeError. Solution: either give the child fields defaults too, or use field() with kw_only=True (Python 3.10+).
__slots__ for Memory Efficiency
Python 3.10 added slots=True to @dataclass. This creates a __slots__ class, preventing the creation of __dict__ and significantly reducing per-instance memory when you have thousands of objects:
@dataclass(slots=True) # Python 3.10+
class Tick:
symbol: str
price: float
volume: int
timestamp: floatBenchmark: A list of 1 million plain dataclass instances typically uses ~360 MB. With slots=True, the same structure drops to ~120 MB — a 3× reduction.
JSON Serialisation Patterns
Dataclasses don't serialize to JSON natively, but the standard library's dataclasses.asdict() and dataclasses.astuple() produce plain dictionaries and tuples that json.dumps() can handle:
import json
import dataclasses
from dataclasses import dataclass
from datetime import datetime
@dataclass
class Event:
id: str
name: str
ts: datetime
def event_serialiser(obj):
if isinstance(obj, datetime):
return obj.isoformat()
raise TypeError(f"Unserializable: {type(obj)}")
evt = Event(id="ev_01", name="login", ts=datetime.utcnow())
# Serialize:
data = dataclasses.asdict(evt)
json_str = json.dumps(data, default=event_serialiser)
# Deserialize:
raw = json.loads(json_str)
evt2 = Event(**{**raw, "ts": datetime.fromisoformat(raw["ts"])})For more complex cases (nested dataclasses, optional fields, camelCase conversion), consider dacite, marshmallow-dataclass, or pydantic.
Comparison: Dataclass vs NamedTuple vs attrs
| Feature | dataclass | NamedTuple | attrs |
|---|---|---|---|
| Mutable by default | ✅ Yes | ❌ No | ✅ Yes |
| Immutable option | frozen=True | Always | frozen=True |
| Hashable (frozen) | ✅ | ✅ | ✅ |
| Unpacking support | ❌ | ✅ | ❌ |
| Slots support | Python 3.10+ | Built-in | ✅ |
| Validators | Manual __post_init__ | Manual | Built-in |
| Standard library | ✅ | ✅ | ❌ (3rd party) |
| JSON-friendly | asdict() | _asdict() | attrs.asdict() |
- dataclass — general-purpose data containers, API response models, configuration objects. Best default choice.
- NamedTuple — when you need tuple unpacking, backward compatibility with tuple-expecting APIs, or CSV row types.
- attrs / pydantic — when you need built-in validators, serialization with aliasing, or strict runtime type checking in production APIs.
Practical Pattern: Configuration Object
import os
from dataclasses import dataclass, field
@dataclass
class AppConfig:
db_url: str = field(default_factory=lambda: os.environ["DATABASE_URL"])
debug: bool = field(default_factory=lambda: os.getenv("DEBUG", "").lower() == "true")
max_connections: int = 10
allowed_origins: list[str] = field(
default_factory=lambda: os.getenv("ALLOWED_ORIGINS", "http://localhost:3000").split(",")
)
def __post_init__(self):
if self.max_connections < 1:
raise ValueError("max_connections must be at least 1")
config = AppConfig() # reads from environment at import time
print(config.debug) # False in production if DEBUG env var not set