Addressing Memory Insufficiency in Doris Computing: Causes and Solutions

Cloud & DevOps Hub 0 25

Apache Doris, as a high-performance real-time analytical database, has gained significant traction in big data analytics. However, many users encounter memory insufficiency issues during complex computations, leading to query failures or system instability. This article explores the root causes of memory bottlenecks in Doris and provides actionable solutions to optimize resource utilization.

Memory Management

Understanding Memory Usage in Doris Doris's memory consumption primarily stems from three areas:

  1. Query Execution: Analytical queries involving large datasets, joins, or aggregations require substantial memory for intermediate results.
  2. Data Loading: Bulk data ingestion operations (e.g., Spark Load) create temporary memory pressure.
  3. Metadata Management: Storage of table schemas, partition info, and replica statuses consumes memory, especially in large-scale deployments.

A single BE (Backend) node in Doris typically allocates 20–30% of system memory to queries by default. When concurrent queries exceed this threshold, "Memory limit exceeded" errors occur.

Common Causes of Memory Insufficiency

  1. Overly Complex Queries
  • Multi-table joins without proper filtering
  • Unoptimized SQL with redundant calculations
  • High-cardinality GROUP BY or DISTINCT operations
  1. Resource Configuration Misalignment
  • Default memory limits unsuitable for workload patterns
  • Improper thread pool sizing leading to resource contention
  1. Data Skew
  • Uneven data distribution across partitions or tablets
  • Hotspot nodes handling disproportionate query loads
  1. Hardware Limitations
  • Insufficient RAM relative to dataset size
  • Lack of vertical scaling for memory-intensive workloads

Diagnostic Techniques

  1. Monitor Memory Metrics
  • Use SHOW BACKENDS to track BE node memory usage
  • Analyze query profiles via EXPLAIN ANALYZE
  • Enable Doris's built-in Prometheus metrics for granular insights
  1. Identify Memory-Intensive Queries

    SELECT query_id, user, memory_usage 
    FROM information_schema.query_statistics 
    ORDER BY memory_usage DESC LIMIT 10;
  2. Tablet Health Checks Verify tablet distribution with:

    ADMIN SHOW REPLICA DISTRIBUTION FROM tbl_name;

Optimization Strategies 1. Query-Level Tuning

  • Predicate Pushdown: Filter data early in execution plans
    SELECT * FROM orders WHERE date >= '2023-01-01' -- Push filter to storage layer
  • Avoid Cartesian Products: Use explicit JOIN conditions
  • Limit Intermediate Result Sets:
    SELECT * FROM (SELECT ... SUBQUERY) t LIMIT 1000 -- Reduce temp table size

2. System Configuration Adjustments

  • Memory Parameters:
    # be.conf 
    mem_limit = 70%        # Percentage of total memory 
    storage_page_cache_limit=32G  # Adjust based on SSD/HDD ratio
  • Enable Spill-to-Disk:
    spill_mode = auto 
    spill_storage_root_path = /data/spill

3. Data Modeling Improvements

  • Partitioning:
    PARTITION BY RANGE(dt) (START ("2022-01-01") END ("2024-01-01") EVERY (INTERVAL 1 MONTH))
  • Aggregation Tables: Precompute frequent metrics using Aggregate Key models
  • Colocate Related Data: Minimize network transfers during joins

4. Hardware Scaling

  • Vertical Scaling: Upgrade BE nodes to 128GB+ RAM for OLAP workloads
  • Horizontal Scaling: Add BE nodes and rebalance tablets
  • Hybrid Storage: Use SSDs for hot data, HDDs for cold storage

Case Study: E-commerce Platform Optimization A retail company using Doris for real-time inventory analysis faced OOM errors during peak sales. After implementing:

  • Query timeout reduction from 300s → 60s
  • Tablet redistribution across 20 → 50 BE nodes
  • Creation of materialized views for top 100 product metrics Memory errors decreased by 92% while maintaining P99 query latency under 2s.

Future Directions

  1. Dynamic Memory Allocation: Per-query memory budgeting based on priority
  2. Cloud-Native Enhancements: Kubernetes-aware memory autoscaling
  3. AI-Driven Optimization: Machine learning models for predictive resource allocation

Memory optimization in Doris requires a holistic approach combining SQL tuning, system configuration, and infrastructure planning. By adopting these strategies, organizations can achieve stable performance even with growing data volumes and analytical complexity. Regular monitoring and proactive capacity planning remain critical to prevent memory-related bottlenecks.

Related Recommendations: