← Hub
Microservice Design
PDI Reference Architecture APIs Implementation Guide

Lula Direct × PDI Loyalty Service

Production-grade NestJS microservice for real-time loyalty reward calculation and session management

Module Architecture

The service is organized around six core modules, each responsible for specific business concerns. Dependencies flow from concrete implementations up to the application layer.

🏠
AppModule
Root · Entry Point
AppService
AppController
ConfigModulePdiModuleSessionModuleLoyaltyModuleHealthModule
⚙️
ConfigModule
Global · Environment
ConfigServiceEnvValidator
ConfigService
🔌
PdiModule
API Client · Auth · Resilience
PdiClientServicePdiAuthServiceCircuitBreakerService
ConfigModuleHttpModule
PdiClientService
💾
SessionModule
State Machine · Repository
SessionServiceSessionRepositoryStateMachine
TypeOrmModuleConfigModule
SessionService
🎯
LoyaltyModule
Core Business Logic
LoyaltyServiceRewardCalculator
LoyaltyController
SessionModulePdiModule
🏥
HealthModule
Probes · Monitoring
HealthServicePdiHealthIndicator
HealthController
TerminusModule

Dependency Flow

📊 Module Dependency Hierarchy
2
⚙️
ConfigModule
Global · 2 providers
↓↓↓
imports globally
3
🔌
PdiModule
API Client · Auth · Circuit Breaker
3
💾
SessionModule
Repository · State Machine
🏥
HealthModule
Probes · PDI Indicator
↓↓↓
imports + exports
2
🎯
LoyaltyModule
Controller · Service · Calculator
↓↓↓
orchestrates
5
🏠
AppModule
Root · Imports all modules

Core Concepts

NestJS Modules

Each module encapsulates related providers and controllers. The dependency injection system manages instantiation and sharing.

  • ConfigModule loaded globally
  • Feature modules use imports/exports for composition
  • Lazy-loading not required at this scale

Providers & Injection

Services are stateless where possible. Database operations use repositories. External APIs use client services with error handling.

  • Singleton scope for database pools
  • Request scope for HTTP contexts
  • Transient for stateless utilities

Controllers & Routing

Minimal logic in controllers. All business logic delegated to services. Request/response validation via DTOs and pipes.

  • REST endpoints for loyalty operations
  • Health checks for monitoring
  • Request body validation with class-validator

Project Layout

Source code follows NestJS conventions with feature-based module organization.

loyalty-service — NestJS Microservice
Explorer 📁 8  📄 32
📂src/
TSmain.ts1.2 KB
TSapp.module.ts2.1 KB
📂config/
TSconfig.module.ts0.8 KB
TSenv.validation.ts1.5 KB
TSconstants.ts0.6 KB
📂pdi/
TSpdi.module.ts1.1 KB
TSpdi-client.service.ts4.2 KB
TSpdi-auth.service.ts3.8 KB
TScircuit-breaker.service.ts2.9 KB
TSpdi.types.ts1.4 KB
TSpdi.constants.ts0.5 KB
📂session/
TSsession.module.ts0.9 KB
TSsession.service.ts5.1 KB
TSsession.repository.ts2.3 KB
TSsession.entity.ts1.8 KB
TSsession-state.machine.ts2.6 KB
TSsession.types.ts0.7 KB
📂loyalty/
TSloyalty.module.ts1.0 KB
TSloyalty.controller.ts3.4 KB
TSloyalty.service.ts4.7 KB
TSreward.calculator.ts2.2 KB
📂dto/
TSidentify.dto.ts0.8 KB
TScalculate-rewards.dto.ts1.2 KB
TSfinalize.dto.ts0.7 KB
📂health/
TShealth.module.ts0.6 KB
TShealth.controller.ts1.9 KB
TSpdi-health.indicator.ts1.3 KB
📂common/
TSapp.filter.ts1.6 KB
TSlogging.interceptor.ts2.0 KB
TSrequest.guard.ts1.1 KB
📂test/
TSapp.e2e.spec.ts3.5 KB
JSjest.config.js0.4 KB
🐳Dockerfile0.9 KB
YMLdocker-compose.yml1.8 KB
ENV.env.example0.5 KB
{ }package.json2.4 KB
{ }tsconfig.json0.6 KB
{ }nest-cli.json0.3 KB
No file selected
📂
Select a file to view details
Click any file in the explorer to see its description, key exports, and code preview

File Descriptions

src/main.ts

Application entry point. Bootstraps NestJS app, configures middleware, and starts the HTTP server.

src/config/

Environment validation and global configuration. Loaded early before any modules initialize.

src/pdi/

PDI API client and auth logic. Includes circuit breaker for resilience and OAuth2 token management.

src/session/

Session entity, repository, and state machine. Handles session lifecycle from creation to finalization.

src/loyalty/

Core business logic for loyalty operations. Controllers expose HTTP endpoints, services handle calculations.

src/health/

Health check endpoints for Kubernetes and monitoring systems. Includes PDI endpoint status indicator.

PDI API Integration

The loyalty service communicates with PDI Point of Sale system through a secure, fault-tolerant client.

OAuth2 Authentication Flow

🔐 OAuth2 Client Credentials Pipeline
1
📡
Client Request
Loyalty API call
requires auth
→→
2
🗄️
Check Cache
In-memory token
lookup (O(1))
→→
3
⏱️
Validate TTL
55min TTL with
5min proactive
→→
4
🔑
Token Exchange
POST /oauth/token
client_credentials
→→
5
Attach Bearer
Authorization header
+ proceed to PDI
55m
Token TTL
5m
Proactive Refresh
Max Retries
Exp. Backoff
O(1)
Cache Lookup

API Endpoints

Endpoint Method Purpose Timeout
/identify POST Initiate loyalty session, validate customer 5s
/calculateRewards POST Calculate earned points for transaction 8s
/finalize POST Commit rewards to customer account 10s
/getSessionStatus GET Query session state and earned points 3s

Circuit Breaker States

⚡ Circuit Breaker State Machine
CLOSED
Healthy
→→→
5 Failures
🚫
OPEN
Fail Fast
→→→
30s Timeout
🔄
HALF_OPEN
Testing
→→→
Success ✓
CLOSED
Recovered
🎯
5 hits
Failure Threshold
⏱️
30s
Open Timeout
📦
1K max
Queue Items
🔁
3×
Retry Attempts

Error Handling

Timeout Errors

If PDI doesn't respond within timeout, circuit breaker records failure and transitions accordingly.

  • identify: 5s
  • calculateRewards: 8s
  • finalize: 10s

Authentication Errors

401/403 responses trigger token refresh. If refresh fails, service returns 503 with retry-after header.

  • Exponential backoff on token refresh
  • Max 3 retries before giving up
  • Return 503 to caller

Business Errors

PDI may reject operations (invalid customer, insufficient points). These propagate to client as 400 errors with details.

  • Preserve error codes from PDI
  • Include error message in response
  • Log for audit trail

Cascading Failures

When circuit breaker is OPEN, service queues operations and retries with exponential backoff.

  • In-memory queue up to 1000 items
  • Worker retries every 5-30s
  • Persistence via database if needed

Request/Response Examples

→ Request POST /loyalty/identify

{
  "customerId": "CUST-12345",
  "channelId": "POS-001",
  "transactionId": "TXN-67890"
}

← Response 200 OK

{
  "sessionId": "sess_abc123",
  "status": "active",
  "customerName": "John Doe",
  "currentPoints": 5000,
  "tier": "Gold"
}

→ Request POST /loyalty/calculate-rewards

{
  "sessionId": "sess_abc123",
  "transactionAmount": 150.00,
  "items": [
    { "sku": "SKU-001", "qty": 2 }
  ]
}

← Response 200 OK

{
  "earnedPoints": 450,
  "multiplier": 3.0,
  "bonusPoints": 150,
  "breakdown": {
    "basePoints": 300,
    "tierBonus": 150
  }
}

→ Request POST /loyalty/finalize

{
  "sessionId": "sess_abc123",
  "transactionId": "TXN-67890",
  "finalAmount": 150.00,
  "applyRewards": true
}

← Response 200 OK

{
  "sessionId": "sess_abc123",
  "status": "finalized",
  "earnedPoints": 450,
  "newBalance": 5450,
  "receiptId": "RCP-2025-001"
}

Session Lifecycle

Sessions track the state of a loyalty transaction from customer identification through reward finalization.

State Machine

🔄 Session State Transitions
📝
Pending
Created
→→→
identify()
🟢
Active
Identified
→→→
calculate()
Pending
Rewards Calc
→→→
finalize()
Finalized
Success
→→→
error / timeout
Failed
Error
24h
Session TTL
15m
Cleanup Interval
30d
Archive After
5s
Lock Timeout
10m
Idempotency Window

Session Entity

export interface Session {
  id: string;                 // UUID
  customerId: string;          // PDI customer ID
  status: SessionStatus;     // State enum
  transactionId: string;       // Unique transaction ref
  earnedPoints: number;       // Total earned this session
  redeemedPoints: number;     // If redeemed
  pdiSessionId: string;       // Remote session handle
  createdAt: Date;            // Session start
  expiresAt: Date;            // TTL
  finalizedAt: Date | null;   // When committed
  metadata: Record<string, any>; // Custom data
}
TTL Strategy: Sessions expire after 24 hours by default. A background job runs every 15 minutes to clean up expired sessions. Finalized sessions are archived after 30 days.

Concurrent Session Handling

Database Locking

Sessions use pessimistic locking during state transitions to prevent race conditions.

  • Row-level lock during update
  • 5s acquisition timeout
  • Automatic release on error

Idempotency Keys

Duplicate requests (same transactionId) return cached result instead of creating duplicate sessions.

  • Cache by transactionId + timestamp
  • 10 minute window
  • Stored in Redis

Optimistic Concurrency

Sessions include a version field. Updates fail if version doesn't match, requiring retry.

  • Version incremented on each save
  • Client includes version in update
  • Prevents lost updates

Transaction Isolation

All database operations run in serializable isolation for strong consistency.

  • Prevents dirty reads
  • Ensures data integrity
  • Slight performance cost

Cleanup Jobs

Job Schedule Action
Expire Sessions Every 15 minutes Mark expired sessions as FAILED, soft delete
Archive Finalized Daily at 2 AM UTC Move 30+ day old sessions to archive table
Clear Cache Every hour Remove idempotency entries older than 10 minutes
Reconciliation Daily at 1 AM UTC Verify finalized sessions against PDI records

Structured Logging

All logs are JSON-formatted with consistent fields for easy parsing and aggregation.

// Example log entry
{
  "timestamp": "2025-03-16T14:30:45.123Z",
  "level": "INFO",
  "service": "loyalty-service",
  "context": "LoyaltyController",
  "message": "Session finalized successfully",
  "sessionId": "sess_abc123",
  "customerId": "CUST-12345",
  "duration": 1250,
  "earnedPoints": 450,
  "traceId": "trace-xyz789",
  "spanId": "span-abc456"
}
Log Levels: DEBUG (detailed state info), INFO (successful operations), WARN (recoverable issues), ERROR (failures requiring attention), FATAL (service-level outages).

Metrics & Instrumentation

Latency Histograms

  • loyalty_identify_duration_ms
  • loyalty_calculate_duration_ms
  • loyalty_finalize_duration_ms
  • pdi_request_duration_ms

Error Rates

  • loyalty_errors_total (by error type)
  • pdi_circuit_breaker_state
  • session_expiry_count
  • db_transaction_rollbacks

Business Metrics

  • points_earned_total
  • points_redeemed_total
  • sessions_created_count
  • sessions_finalized_count

Resource Metrics

  • http_requests_total (by endpoint, status)
  • db_connection_pool_size
  • redis_connections_active
  • process_memory_bytes

Prometheus Endpoint

GET /metrics
// Returns Prometheus-formatted metrics
// Scrape interval: every 30 seconds
// Typical response size: 15-20 KB

Health Check Endpoints

Endpoint Purpose Dependencies
GET /health General liveness probe Service process only
GET /health/ready Readiness probe (ready to accept traffic) Database, Redis, PDI
GET /health/live Deep health check with dependency status All critical systems

Health Response Example

200 OK
{
  "status": "healthy",
  "timestamp": "2025-03-16T14:30:45Z",
  "uptime": 3600,
  "checks": {
    "database": { "status": "up", "latency": 5 },
    "redis": { "status": "up", "latency": 2 },
    "pdi": {
      "status": "degraded",
      "circuitBreaker": "HALF_OPEN",
      "latency": 4500
    }
  }
}

Alerting Rules

Critical Alerts

  • Circuit breaker OPEN > 5 min
  • Error rate > 5% for 1 min
  • Database unavailable
  • Memory usage > 80%

Warning Alerts

  • P99 latency > 2s for 5 min
  • Circuit breaker HALF_OPEN > 10 min
  • Session expiry spike > 2x baseline
  • Cache hit rate < 80%

Info Notifications

  • Daily summary email
  • New deployments
  • Configuration changes
  • Database maintenance windows

Dockerfile

Multi-stage build for minimal runtime image with production-grade security.

# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --omit=dev
COPY . .
RUN npm run build

# Runtime stage
FROM node:20-alpine
WORKDIR /app
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nestjs -u 1001
COPY --from=builder --chown=nestjs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nestjs:nodejs /app/dist ./dist
USER nestjs
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
    CMD node -e "require('http').get('http://localhost:3000/health', (r) => {if (r.statusCode !== 200) throw new Error(r.statusCode)})"
CMD ["node", "dist/main.js"]
Image Details: Alpine base (11.2 MB), non-root user for security, health check enabled, startup probe 40s, liveness 30s interval.

Docker Compose

Local development environment with all dependencies.

version: '3.8'
services:
  app:
    build: .
    ports:
      - "3000:3000"
    environment:
      - DATABASE_URL=postgresql://user:pass@postgres:5432/loyalty
      - REDIS_URL=redis://redis:6379
      - PDI_API_URL=https://pdi-api.internal
      - NODE_ENV=development
    depends_on:
      postgres: { condition: service_healthy }
      redis: { condition: service_healthy }

  postgres:
    image: postgres:16-alpine
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
      - POSTGRES_DB=loyalty
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user"]

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]

volumes:
  postgres_data: {}

Environment Variables

Variable Required Default Description
NODE_ENV Yes production Environment: development, test, production
PORT No 3000 HTTP server port
DATABASE_URL Yes PostgreSQL connection string
REDIS_URL Yes Redis connection string for caching
PDI_API_URL Yes PDI API base URL
PDI_CLIENT_ID Yes OAuth2 client ID for PDI
PDI_CLIENT_SECRET Yes OAuth2 client secret (secret)
LOG_LEVEL No info Min log level: debug, info, warn, error
SESSION_TTL_HOURS No 24 Session expiry time in hours

Resource Requirements

Development

  • CPU: 0.5 - 1 vCPU
  • Memory: 512 - 1024 MB
  • Storage: 2 GB
  • Replicas: 1

Staging

  • CPU: 1 - 2 vCPU
  • Memory: 1 - 2 GB
  • Storage: 10 GB
  • Replicas: 2

Production

  • CPU: 2 - 4 vCPU
  • Memory: 2 - 4 GB
  • Storage: 50 GB
  • Replicas: 3+

Database (PostgreSQL)

  • CPU: 2 vCPU (min)
  • Memory: 4 GB (min)
  • Storage: 100 GB + growth buffer
  • Backup: daily snapshots

Kubernetes Deployment

Standard k8s manifests for ECS, GKE, or self-hosted clusters.

Recommended Configuration

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: loyalty-service
spec:
  containers:
  - name: loyalty-service
    image: loyalty-service:latest
    ports:
    - containerPort: 3000
    livenessProbe:
      httpGet:
        path: /health/live
        port: 3000
      initialDelaySeconds: 40
      periodSeconds: 30
    readinessProbe:
      httpGet:
        path: /health/ready
        port: 3000
      initialDelaySeconds: 20
      periodSeconds: 10
    resources:
      requests:
        cpu: "500m"
        memory: "512Mi"
      limits:
        cpu: "2000m"
        memory: "2Gi"

Scaling Strategy

Horizontal Scaling

Service is stateless, scales horizontally with load balancer.

  • Auto-scale on CPU > 70%
  • Auto-scale on memory > 75%
  • Min replicas: 3
  • Max replicas: 20

Database Connection Pool

Connection pooling prevents exhaustion under load.

  • Min connections: 5
  • Max connections: 20 per pod
  • Timeout: 30 seconds
  • Connection reuse TTL: 30 min

Rate Limiting

Protects backend systems and ensures fair resource allocation.

  • Global: 1000 req/sec per service
  • Per customer: 100 req/sec
  • Per IP: 500 req/min
  • Token bucket algorithm

Load Balancer Config

Distributes traffic across replicas efficiently.

  • Algorithm: round-robin
  • Health check: every 10s
  • Drain timeout: 30s
  • Session affinity: none

Deployment Checklist

🚀 Pre-Deployment
Run test suite (unit + integration)
Required
Database migrations tested
Required
Configuration validated
Required
Security scan clean
Critical
Documentation updated
DEPLOY
✅ Post-Deployment
Smoke tests pass
Required
Health checks green
Required
Metrics flowing to monitoring
Critical
No alert spikes
Critical
Customer traffic normal