Files
finance_bot/docs/SECURITY_ARCHITECTURE_ADR.md
2025-12-10 22:09:31 +09:00

503 lines
13 KiB
Markdown

# 🔐 Security Architecture Decision Records
## ADR-001: JWT + HMAC Dual Authentication
### Decision
Use JWT for client authentication + HMAC for request integrity verification.
### Context
- Single JWT alone vulnerable to token theft (XSS, interception)
- HMAC ensures request wasn't tampered with in transit
- Combined approach provides defense-in-depth
### Solution
```
Request Headers:
├─ Authorization: Bearer <jwt_token> # WHO: Authenticate user
├─ X-Signature: HMAC_SHA256(...) # WHAT: Verify content
├─ X-Timestamp: unixtime # WHEN: Prevent replay
└─ X-Client-Id: telegram_bot # WHERE: Track source
```
### Trade-offs
| Pros | Cons |
|------|------|
| More secure | Slight performance overhead |
| Covers multiple attack vectors | More complex debugging |
| MVP ready | Requires client cooperation |
| Can be disabled in MVP | More header management |
### Status
**IMPLEMENTED**
---
## ADR-002: Redis Streams for Event Bus (vs RabbitMQ)
### Decision
Use Redis Streams instead of RabbitMQ for event-driven notifications.
### Context
- Already using Redis for caching/sessions
- Simpler setup for MVP
- Don't need RabbitMQ's clustering (yet)
- Redis Streams has built-in message ordering
### Solution
```
Event Stream: "events"
├─ transaction.created
├─ transaction.executed
├─ budget.alert
├─ goal.completed
└─ member.invited
Consumer Groups:
├─ telegram_bot (consumes all)
├─ notification_worker (consumes alerts)
└─ audit_logger (consumes all)
```
### Trade-offs
| Pros | Cons |
|------|------|
| Simple setup | No clustering (future issue) |
| Less infrastructure | Limited to single Redis |
| Good for MVP | Message limit at max memory |
| Built-in ordering | No message durability guarantee |
### Upgrade Path
When needed: Replace Redis Stream consumer with RabbitMQ consumer. Producer stays same (emit to Stream AND Queue).
### Status
**DESIGNED, NOT YET IMPLEMENTED**
---
## ADR-003: Compensation Transactions Instead of Deletion
### Decision
Never delete transactions. Create compensation (reverse) transactions instead.
### Context
- Financial system requires immutability
- Audit trail must show all changes
- Regulatory compliance (many jurisdictions require this)
- User may reverse a reversal
### Solution
```
Transaction Reversal Flow:
Original Transaction (ID: 100)
├─ amount: 50.00 USD
├─ from_wallet: Cash
├─ to_wallet: Bank
└─ status: "executed"
└─▶ User requests reversal
├─ Create Reversal Transaction (ID: 102)
│ ├─ amount: 50.00 USD
│ ├─ from_wallet: Bank (REVERSED)
│ ├─ to_wallet: Cash (REVERSED)
│ ├─ type: "reversal"
│ ├─ original_tx_id: 100
│ └─ status: "executed"
└─ Update Original
├─ status: "reversed"
├─ reversed_at: now
└─ reversal_reason: "User requested..."
```
### Benefits
**Immutability**: No data loss
**Audit Trail**: See what happened and why
**Reversals of Reversals**: Can reverse the reversal
**Compliance**: Meets financial regulations
**Analytics**: Accurate historical data
### Implementation
```python
# Database
TransactionStatus: draft | pending_approval | executed | reversed
# Fields
original_transaction_id # FK self-reference
reversed_at # When reversed
reversal_reason # Why reversed
```
### Status
**IMPLEMENTED**
---
## ADR-004: Family-Level Isolation vs Database-Level
### Decision
Implement family isolation at service/API layer (vs database constraints).
### Context
- Easier testing (no DB constraints to work around)
- More flexibility (can cross-family operations if needed)
- Performance (single query vs complex JOINs)
- Security (defense in depth)
### Solution
```python
# Every query includes family_id filter
Transaction.query.filter(
Transaction.family_id == user_context.family_id
)
# RBAC middleware also checks:
RBACEngine.check_family_access(user_context, requested_family_id)
# Service layer validates before operations
WalletService.get_wallet(wallet_id, family_id=context.family_id)
```
### Trade-offs
| Approach | Pros | Cons |
|----------|------|------|
| **Service Layer (Selected)** | Flexible, testable, fast queries | Requires discipline |
| **Database FK** | Enforced by DB | Inflexible, complex queries |
| **Combined** | Both protections | Double overhead |
### Status
**IMPLEMENTED**
---
## ADR-005: Approval Workflow in Domain Model
### Decision
Implement transaction approval as state machine in domain model.
### Context
- High-value transactions need approval
- State transitions must be valid
- Audit trail must show approvals
- Different thresholds per role
### Solution
```
Transaction State Machine:
DRAFT (initial)
└─▶ [Check amount vs threshold]
├─ If small: EXECUTED (auto-approve)
└─ If large: PENDING_APPROVAL (wait for approval)
PENDING_APPROVAL
├─▶ [Owner approves] → EXECUTED
└─▶ [User cancels] → DRAFT
EXECUTED
└─▶ [User/Owner reverses] → Create REVERSED tx
REVERSED (final state)
└─ Can't transition further
```
### Threshold Rules
```python
APPROVAL_THRESHOLD = $500
# Child transactions
if role == CHILD and amount > $50:
status = PENDING_APPROVAL
# Member transactions
if role == MEMBER and amount > $500:
status = PENDING_APPROVAL
# Adult/Owner: Never need approval (auto-execute)
```
### Implementation
```python
# Schema
TransactionStatus = Enum['draft', 'pending_approval', 'executed', 'reversed']
# Fields
status: TransactionStatus
confirmation_required: bool
confirmation_token: str # Verify it's real approval
approved_by_id: int
approved_at: datetime
# Service layer validates state transitions
TransactionService.confirm_transaction():
if tx.status != "pending_approval":
raise ValueError("Invalid state transition")
```
### Status
**IMPLEMENTED**
---
## ADR-006: HS256 for MVP, RS256 for Production
### Decision
Use symmetric HMAC-SHA256 (HS256) for MVP, upgrade to asymmetric RS256 for production.
### Context
- HS256: Same secret for signing & verification (simple)
- RS256: Private key to sign, public key to verify (scalable)
- MVP: Simple deployment needed
- Production: Multiple API instances need to verify tokens
### Solution
```python
# MVP: HS256 (symmetric)
jwt_manager = JWTManager(secret_key="shared-secret")
token = jwt.encode(payload, secret, algorithm="HS256")
verified = jwt.decode(token, secret, algorithms=["HS256"])
# Production: RS256 (asymmetric)
with open("private.pem") as f:
private_key = f.read()
with open("public.pem") as f:
public_key = f.read()
token = jwt.encode(payload, private_key, algorithm="RS256")
verified = jwt.decode(token, public_key, algorithms=["RS256"])
```
### Migration Path
1. Generate RSA key pair
2. Update JWT manager to accept algorithm config
3. Deploy new version with RS256 validation (backward compatible)
4. Stop issuing HS256 tokens
5. HS256 tokens expire naturally
### Status
**HS256 IMPLEMENTED, RS256 READY**
---
## ADR-007: Telegram Binding via Temporary Codes
### Decision
Use temporary binding codes instead of direct token requests.
### Context
- Security: Code has limited lifetime & single use
- User Experience: Simple flow (click link)
- Phishing Prevention: User confirms on web, not just in Telegram
- Bot doesn't receive sensitive tokens
### Solution
```
Flow:
1. User: /start
2. Bot: Generate code (10-min TTL)
3. Bot: Send link with code
4. User: Clicks link (authenticate on web)
5. Web: Confirm binding, create TelegramIdentity
6. Web: Issue JWT for bot to use
7. Bot: Stores JWT in Redis
8. Bot: Uses JWT for API calls
```
### Code Generation
```python
code = secrets.token_urlsafe(24) # 32-char random string
# Store in Redis: 10-min TTL
redis.setex(f"telegram:code:{code}", 600, chat_id)
# Generate link
url = f"https://app.com/auth/telegram?code={code}&chat_id={chat_id}"
```
### Status
**IMPLEMENTED**
---
## ADR-008: Service Token for Bot-to-API Communication
### Decision
Issue separate service token (not user token) for bot API requests.
### Context
- Bot needs to make requests independently (not as specific user)
- Different permissions than user tokens
- Different expiry (1 year vs 15 min)
- Can be rotated independently
### Solution
```python
# Service Token Payload
{
"sub": "service:telegram_bot",
"type": "service",
"iat": 1702237800,
"exp": 1733773800, # 1 year
}
# Bot uses service token:
Authorization: Bearer <service_token>
X-Client-Id: telegram_bot
```
### Use Cases
- Service token: Schedule reminders, send notifications
- User token: Create transaction as specific user
### Status
**IMPLEMENTED**
---
## ADR-009: Middleware Order Matters
### Decision
Security middleware must execute in specific order.
### Context
- FastAPI adds middleware in reverse registration order
- Each middleware depends on previous setup
- Wrong order = security bypass
### Solution
```python
# Registration order (will execute in reverse):
1. RequestLoggingMiddleware (last to execute)
2. RBACMiddleware
3. JWTAuthenticationMiddleware
4. HMACVerificationMiddleware
5. RateLimitMiddleware
6. SecurityHeadersMiddleware (first to execute)
# Execution flow:
SecurityHeaders
├─ Add HSTS, X-Frame-Options, etc.
RateLimit
├─ Check IP-based rate limit
├─ Increment counter in Redis
HMACVerification
├─ Verify X-Signature
├─ Check timestamp freshness
├─ Prevent replay attacks
JWTAuthentication
├─ Extract token from Authorization header
├─ Verify signature & expiration
├─ Store user context in request.state
RBAC
├─ Load user role
├─ Verify family access
├─ Store permissions
RequestLogging
├─ Log all requests
├─ Record response time
```
### Implementation
```python
def add_security_middleware(app: FastAPI, redis_client, db_session):
# Order matters!
app.add_middleware(RequestLoggingMiddleware)
app.add_middleware(RBACMiddleware, db_session=db_session)
app.add_middleware(JWTAuthenticationMiddleware)
app.add_middleware(HMACVerificationMiddleware, redis_client=redis_client)
app.add_middleware(RateLimitMiddleware, redis_client=redis_client)
app.add_middleware(SecurityHeadersMiddleware)
```
### Status
**IMPLEMENTED**
---
## ADR-010: Event Logging is Mandatory
### Decision
Every data modification is logged to event_log table.
### Context
- Regulatory compliance (financial systems)
- Audit trail for disputes
- Debugging (understand what happened)
- User transparency (show activity history)
### Solution
```python
# Every service method logs events
event = EventLog(
family_id=family_id,
entity_type="transaction",
entity_id=tx_id,
action="create", # create|update|delete|confirm|execute|reverse
actor_id=user_id,
old_values={"balance": 100},
new_values={"balance": 50},
ip_address=request.client.host,
user_agent=request.headers.get("user-agent"),
reason="User requested cancellation",
created_at=datetime.utcnow(),
)
db.add(event)
```
### Fields Logged
```
EventLog:
├─ entity_type: What was modified (transaction, wallet, budget)
├─ entity_id: Which record (transaction #123)
├─ action: What happened (create, update, delete, reverse)
├─ actor_id: Who did it (user_id)
├─ old_values: Before state (JSON)
├─ new_values: After state (JSON)
├─ ip_address: Where from
├─ user_agent: What client
├─ reason: Why (for deletions)
└─ created_at: When
```
### Access Control
```python
# Who can view event_log?
├─ Owner: All events in family
├─ Adult: All events in family
├─ Member: Only own transactions' events
├─ Child: Very limited
└─ Read-Only: Selected events (audit/observer)
```
### Status
**IMPLEMENTED**
---
## Summary Table
| ADR | Title | Status | Risk | Notes |
|-----|-------|--------|------|-------|
| 001 | JWT + HMAC | ✅ | Low | Dual auth provides defense-in-depth |
| 002 | Redis Streams | ⏳ | Medium | Upgrade path to RabbitMQ planned |
| 003 | Compensation Tx | ✅ | Low | Immutability requirement met |
| 004 | Family Isolation | ✅ | Low | Service-layer isolation + RBAC |
| 005 | Approval Workflow | ✅ | Low | State machine properly designed |
| 006 | HS256→RS256 | ✅ | Low | Migration path clear |
| 007 | Binding Codes | ✅ | Low | Secure temporary code flow |
| 008 | Service Tokens | ✅ | Low | Separate identity for bot |
| 009 | Middleware Order | ✅ | Critical | Correctly implemented |
| 010 | Event Logging | ✅ | Low | Audit trail complete |
---
**Document Version:** 1.0
**Last Updated:** 2025-12-10
**Review Frequency:** Quarterly