The Basics

from dataclasses import dataclass, field
from typing import Optional
from datetime import datetime

@dataclass
class User:
    id: int
    name: str
    email: str
    role: str = "user"
    active: bool = True
    created_at: datetime = field(default_factory=datetime.utcnow)
    tags: list[str] = field(default_factory=list)

# Auto-generated __init__:
alice = User(id=1, name="Alice Chen", email="alice@example.com", role="admin")
print(alice)
# User(id=1, name='Alice Chen', email='alice@example.com', role='admin', active=True, ...)

The decorator auto-generates __init__, __repr__, and __eq__. Fields with defaults must come after fields without defaults — the same rule as regular function parameters.

⚠️ Mutable Default Trap

Never write tags: list = [] in a dataclass. Python creates one list shared across all instances. Always use field(default_factory=list) to create a fresh list per instance.

field() — Fine-Grained Control

from dataclasses import dataclass, field

@dataclass
class Order:
    order_id: str
    items: list[str] = field(default_factory=list)

    # exclude from __repr__ (e.g. sensitive data)
    _internal_token: str = field(default="", repr=False)

    # exclude from __init__ (computed field)
    item_count: int = field(init=False, repr=True)

    # exclude from __eq__ comparison
    timestamp: float = field(default=0.0, compare=False)

    def __post_init__(self):
        self.item_count = len(self.items)

__post_init__: Computed Fields and Validation

__post_init__ runs after the auto-generated __init__ — the right place for derived fields, validation, or type coercion:

from dataclasses import dataclass
from decimal import Decimal

@dataclass
class Money:
    amount: float
    currency: str = "INR"

    def __post_init__(self):
        if self.amount < 0:
            raise ValueError(f"Amount cannot be negative: {self.amount}")
        # Normalise to 2 decimal places
        self.amount = round(self.amount, 2)
        self.currency = self.currency.upper()

price = Money(amount=1299.999)
print(price)  # Money(amount=1300.0, currency='INR')

Frozen Dataclasses: Immutable Value Objects

Set frozen=True to make all fields read-only after creation. The object also becomes hashable, allowing it to be used as a dictionary key or in a set:

from dataclasses import dataclass

@dataclass(frozen=True)
class Coordinate:
    latitude: float
    longitude: float

    def distance_to(self, other: "Coordinate") -> float:
        # Haversine formula (simplified)
        dlat = abs(self.latitude - other.latitude)
        dlon = abs(self.longitude - other.longitude)
        return (dlat**2 + dlon**2) ** 0.5

home = Coordinate(latitude=28.6139, longitude=77.2090)    # Delhi
office = Coordinate(latitude=19.0760, longitude=72.8777)  # Mumbai

# home.latitude = 0  # raises FrozenInstanceError

# Works as dict key because it's hashable:
distances = {home: 0.0, office: home.distance_to(office)}

Inheritance

from dataclasses import dataclass

@dataclass
class Animal:
    name: str
    sound: str

@dataclass
class Dog(Animal):
    breed: str
    trained: bool = False

    # Inherits name and sound; adds breed and trained

rex = Dog(name="Rex", sound="Woof", breed="German Shepherd", trained=True)
print(rex)
# Dog(name='Rex', sound='Woof', breed='German Shepherd', trained=True)
⚠️ Inheritance Default Order

If the parent class has fields with defaults and the child class adds fields without defaults, Python raises a TypeError. Solution: either give the child fields defaults too, or use field() with kw_only=True (Python 3.10+).

__slots__ for Memory Efficiency

Python 3.10 added slots=True to @dataclass. This creates a __slots__ class, preventing the creation of __dict__ and significantly reducing per-instance memory when you have thousands of objects:

@dataclass(slots=True)  # Python 3.10+
class Tick:
    symbol: str
    price: float
    volume: int
    timestamp: float

Benchmark: A list of 1 million plain dataclass instances typically uses ~360 MB. With slots=True, the same structure drops to ~120 MB — a 3× reduction.

JSON Serialisation Patterns

Dataclasses don't serialize to JSON natively, but the standard library's dataclasses.asdict() and dataclasses.astuple() produce plain dictionaries and tuples that json.dumps() can handle:

import json
import dataclasses
from dataclasses import dataclass
from datetime import datetime

@dataclass
class Event:
    id: str
    name: str
    ts: datetime

def event_serialiser(obj):
    if isinstance(obj, datetime):
        return obj.isoformat()
    raise TypeError(f"Unserializable: {type(obj)}")

evt = Event(id="ev_01", name="login", ts=datetime.utcnow())

# Serialize:
data = dataclasses.asdict(evt)
json_str = json.dumps(data, default=event_serialiser)

# Deserialize:
raw = json.loads(json_str)
evt2 = Event(**{**raw, "ts": datetime.fromisoformat(raw["ts"])})

For more complex cases (nested dataclasses, optional fields, camelCase conversion), consider dacite, marshmallow-dataclass, or pydantic.

Comparison: Dataclass vs NamedTuple vs attrs

FeaturedataclassNamedTupleattrs
Mutable by default✅ Yes❌ No✅ Yes
Immutable optionfrozen=TrueAlwaysfrozen=True
Hashable (frozen)
Unpacking support
Slots supportPython 3.10+Built-in
ValidatorsManual __post_init__ManualBuilt-in
Standard library❌ (3rd party)
JSON-friendlyasdict()_asdict()attrs.asdict()
📌 When to Use Each
  • dataclass — general-purpose data containers, API response models, configuration objects. Best default choice.
  • NamedTuple — when you need tuple unpacking, backward compatibility with tuple-expecting APIs, or CSV row types.
  • attrs / pydantic — when you need built-in validators, serialization with aliasing, or strict runtime type checking in production APIs.

Practical Pattern: Configuration Object

import os
from dataclasses import dataclass, field

@dataclass
class AppConfig:
    db_url: str = field(default_factory=lambda: os.environ["DATABASE_URL"])
    debug: bool = field(default_factory=lambda: os.getenv("DEBUG", "").lower() == "true")
    max_connections: int = 10
    allowed_origins: list[str] = field(
        default_factory=lambda: os.getenv("ALLOWED_ORIGINS", "http://localhost:3000").split(",")
    )

    def __post_init__(self):
        if self.max_connections < 1:
            raise ValueError("max_connections must be at least 1")

config = AppConfig()   # reads from environment at import time
print(config.debug)    # False in production if DEBUG env var not set