Troubleshooting Worker Crashing Issues

Last updated: November 7, 2024

Issue: Worker Crashing During Run

When a private worker crashes in Spacelift, it’s often due to insufficient memory to handle the stack being processed. This article provides guidance on how to diagnose and resolve worker crashes.

For Private Workers

If your private worker is crashing, the most common reason is memory exhaustion, especially with large or complex stacks.

Steps to Diagnose and Resolve

  1. Monitor CPU and Memory Usage
    Check the CPU and memory usage of your EC2 instance (or equivalent infrastructure) to determine if resource limits are being exceeded. For large stacks, consider using a larger instance type with more memory and processing power.

  2. Enable Debug Logging for Terraform Stacks
    If you're using Terraform, set the environment variable TF_LOG=DEBUG to enable detailed logging. This can provide additional insights and clues about which processes are consuming memory.

    How to Set TF_LOG:

    • Go to your stack's environment tab

    • Add:

      • Name: TF_LOG

      • Value: DEBUG

For Public Workers

If you encounter crashing issues with a public worker, please reach out to the Spacelift support team. Be sure to provide your Run ID so they can assist you more effectively.

Summary

Worker crashes are commonly caused by memory limitations, especially on private workers handling large stacks. Monitoring resource usage and enabling TF_LOG=DEBUG can help identify the root cause. For public worker issues, contact Spacelift Support with the relevant run information.