Technical Support Technical Support

Diagnosing Intermittent Failures in PCBAs

Author: Farway Electronic Time: 2025-09-27  Hits:
Picture this: It's 2 a.m. in the engineering lab, and Maria, a senior test engineer, is staring at a oscilloscope screen in frustration. The PCBA (Printed Circuit Board Assembly) in front of her—part of a new smart thermostat—has been behaving erratically all week. Sometimes it boots up perfectly, communicating with the app and regulating temperature flawlessly. Other times, it freezes midway through a cycle, or worse, sends garbled data that makes the app crash. The kicker? When she runs standard continuity tests or checks voltages at the test points, everything reads normal. No shorts, no opens, no obvious defects. Just… silence when it decides to fail.
Intermittent failures like this are the bane of electronics manufacturing. Unlike permanent failures—where a component is clearly blown or a trace is severed—intermittent issues hide in the shadows, striking only under specific conditions: temperature spikes, mechanical vibration, humidity, or even just the passage of time. They're frustrating not just because they're hard to reproduce, but because they erode trust. A product that works "most of the time" isn't good enough for customers, and for manufacturers, it translates to delayed shipments, increased warranty claims, and wasted engineering hours.
In this article, we'll dive into the world of diagnosing intermittent PCBA failures. We'll explore why they happen, how to systematically track them down, and what tools and strategies can turn "sometimes broken" into "always reliable." Along the way, we'll touch on critical aspects of electronics manufacturing—from smt pcb assembly quality to the role of pcb conformal coating —and how proactive practices like using electronic component management software can prevent these headaches in the first place.

What Are Intermittent Failures, Anyway?

Before we fix the problem, let's define it. An intermittent failure is a malfunction that occurs unpredictably, with periods of normal operation in between. Think of it as a "flaky" connection or component that works when conditions are just right but fails when they're not. Contrast this with a permanent failure, where the PCBA stops working entirely and stays broken—like a burned-out resistor or a cracked trace.
Intermittent failures often follow patterns, even if they're subtle. For example:
  • Environmental triggers: The PCBA fails when the temperature rises above 40°C, but works fine in a 25°C lab. Or it starts acting up during monsoon season, when humidity spikes.
  • Mechanical triggers: A device that fails after being dropped, or when the user taps the case. Loose connectors or solder joints often reveal themselves here.
  • Power-related triggers: Flickering when the battery voltage dips below 3.2V, or when the power supply has a momentary surge.
  • Time-based triggers: Working for the first 10 minutes after power-up, then failing. This could point to heat buildup or component degradation over short periods.
The key to diagnosing these issues is to treat them as detective stories: gather clues, identify patterns, and test hypotheses until the root cause emerges. Let's start by exploring where these clues often lead—common sources of intermittent failures in PCBAs.

Common Root Causes of Intermittent Failures

Intermittent failures rarely come from a single source. They're often the result of a perfect storm of manufacturing defects, material weaknesses, and environmental stress. Let's break down the most likely culprits.

1. Manufacturing Defects: The Hidden Flaws of Assembly

Even with state-of-the-art smt pcb assembly lines, microscopic flaws can slip through quality checks—flaws that only cause problems under specific conditions. For example:
Tombstoning or solder balling in SMT components: During SMT assembly, tiny components like 0402 resistors or capacitors can "tombstone" (stand on end) if solder paste is unevenly applied, or form small solder balls between pads. These defects might not create a short initially, but when the board heats up, the solder expands, bridging the gap temporarily. Once cooled, the ball shrinks, breaking the connection again.
Cold joints in dip soldering: Through-hole components (like connectors or large capacitors) are often soldered using dip soldering service —a process where the board is dipped into molten solder to form joints. If the solder temperature is too low, or the board is withdrawn too quickly, "cold joints" can form: dull, grainy solder connections that lack proper adhesion. Over time, vibration or thermal cycling can loosen these joints, causing intermittent opens.
Insufficient solder mask coverage: The solder mask—the green (or sometimes red, blue, or black) coating that protects copper traces—can have pinholes or thin spots, especially around tightly spaced components. These weak points can allow moisture or dust to accumulate, creating intermittent shorts that come and go with humidity levels.

2. Component Issues: When Parts "Misbehave"

Components are the building blocks of PCBAs, but they're not infallible. Even brand-new parts can harbor defects that only surface intermittently. Common issues include:
ESD-damaged components: Electrostatic discharge (ESD) during handling can weaken semiconductors like ICs or MOSFETs without killing them immediately. Instead, they develop "latent defects"—microscopic cracks in gate oxides or damaged junctions—that fail under stress (e.g., high voltage or temperature). A MOSFET might work at 25°C but short when heated to 60°C, then recover when cooled.
Batch-specific component defects: Sometimes, a single batch of components from a supplier has hidden flaws. For example, a capacitor lot with inconsistent dielectric thickness might leak current intermittently, or a resistor batch with poor solderability might develop high resistance after a few thermal cycles. This is where electronic component management software becomes invaluable: by tracking batch numbers, supplier quality scores, and failure rates, manufacturers can quickly identify if a specific part is the culprit.
Mechanical stress on connectors: Board-to-board connectors or wire harnesses with loose pins or misaligned contacts can create intermittent connections. In portable devices, repeated flexing (e.g., a laptop hinge or a wearable's band) can strain these connections, leading to failures that only occur when the device is moved.

3. Environmental Stress: When the World Takes Its Toll

PCBAs don't exist in a vacuum—they're exposed to heat, cold, moisture, and vibration. Over time, these forces can turn marginal defects into full-blown intermittent failures:
Thermal expansion mismatches: Different materials in a PCBA (copper, FR-4 substrate, component packages) expand and contract at different rates when heated or cooled. This creates stress on solder joints and traces. A BGA (Ball Grid Array) component with a few weak solder balls might stay connected at room temperature, but when the board heats up, the substrate expands, pulling the balls apart and causing an open circuit. Once cooled, the connection re-forms.
Moisture and corrosion under conformal coating: PCB conformal coating is designed to protect boards from moisture, dust, and chemicals—but only if applied correctly. A cracked or uneven coating can trap moisture between the coating and the board. In humid environments, this moisture can cause tiny dendritic growths (metal whiskers) between traces, leading to intermittent shorts. When the board dries out (e.g., in low humidity), the whiskers shrink, and the short disappears.
Vibration-induced wear: In automotive or industrial applications, constant vibration can loosen solder joints, especially on heavy components like transformers or connectors. A joint that's already weakened by a cold solder defect will fail faster under these conditions, creating intermittent opens that come and go with vehicle movement or machine operation.

Diagnostic Tools: The Engineer's Detective Kit

To solve intermittent failures, you need the right tools—tools that can "see" what standard multimeters or visual inspections miss. Here's a breakdown of the most effective diagnostic equipment and techniques, along with when to use them.
Diagnostic Method Tools Required Best For Detecting… Limitations
Thermal Imaging Infrared (IR) camera or thermal scanner Hot spots from shorted components, poor solder joints, or overheating ICs Cannot detect cold joints or defects not related to heat
Vibration Testing Shaker table, accelerometer Loose connections, cracked traces, or weak solder joints under mechanical stress Requires controlled environment; may not replicate real-world vibration patterns
Environmental Chambers Temperature/humidity chambers, thermal cyclers Failures triggered by heat, cold, or moisture (e.g., conformal coating issues) Slow process; may take hours/days to replicate conditions
X-Ray Inspection PCB X-ray machine Hidden solder defects in BGA, CSP, or QFN components (e.g., voids, insufficient wetting) Expensive; requires trained operators to interpret images
ESD Testing ESD gun, voltage meter Latent ESD damage in semiconductors (e.g., MOSFETs, ICs) Can permanently damage already weakened components
While these tools are powerful, they're only as effective as the strategy behind them. Let's walk through a step-by-step process to diagnose intermittent failures, using these tools to test hypotheses systematically.

Step-by-Step Troubleshooting Process

Diagnosing intermittent failures isn't about random testing—it's about following a structured approach to narrow down the possibilities. Here's how to do it:

Step 1: Reproduce the Failure (and Document Everything)

The first rule of troubleshooting intermittent failures: if you can't reproduce the problem, you can't fix it. Start by creating a "failure log" that records:
  • Conditions when failure occurs (temperature, humidity, time of day, device orientation)
  • Actions that precede failure (e.g., "fails after 10 minutes of continuous operation" or "fails when tapped on the top-left corner")
  • Symptoms (e.g., "screen freezes," "no communication via UART," "voltage drop at test point TP12")
  • Conditions when it recovers (e.g., "works again after cooling for 5 minutes" or "after power cycling twice")
For example, Maria (our engineer with the smart thermostat) might note: "Failure occurs when ambient temp > 35°C; device stops sending data to app. Recovers within 2 minutes of cooling to < 30°C." This immediately points to a thermal-related issue.

Step 2: Visual and Physical Inspection

Before breaking out the fancy tools, start with the basics: a thorough visual inspection. Use a stereo microscope (10-40x magnification) to check for:
  • Cold joints: Dull, grainy solder joints (especially on through-hole components from dip soldering service )
  • Cracked traces or solder mask: Look for hairline fractures near component leads or flex points
  • Conformal coating issues: Bubbles, cracks, or thin spots in the pcb conformal coating , especially around high-heat components
  • Component damage: Burn marks, bulging capacitors, or bent leads on connectors
Physical inspection also includes gentle probing: using a non-conductive tool (like a plastic spudger) to press on components or flex the board slightly while monitoring for failures. If the device fails when pressing on a BGA, for example, it's a strong indicator of a solder joint issue.

Step 3: Stress Testing (Replicate the "Perfect Storm")

Once you have a hypothesis (e.g., "failure is temperature-related"), use environmental chambers or thermal cyclers to replicate the conditions. For Maria's thermostat, this might mean placing the PCBA in a chamber set to cycle between 30°C and 40°C while monitoring communication with the app. If the failure occurs consistently during the high end of the cycle, you've confirmed the trigger.
For mechanical issues, use a shaker table to simulate vibration (e.g., 10-2000 Hz, 10 G acceleration) while running the device. For humidity-related failures, expose the board to 85% relative humidity at 60°C (a common "85/85" test) to see if moisture triggers shorts under the conformal coating.

Step 4: Advanced Electrical Testing

Once you've narrowed down the trigger and location (e.g., "the failure occurs near the BGA WiFi module when heated"), use advanced tools to dig deeper:
In-circuit testing (ICT): ICT machines check individual components and traces for continuity, resistance, and capacitance. They can identify if a resistor's value drifts when heated, or if a capacitor leaks current under stress.
Functional testing under stress: Integrate the PCBA into a test fixture that runs its full range of functions (e.g., booting up, communicating, processing data) while applying thermal or vibration stress. This is where the pcba testing process becomes critical—by combining functional tests with stressors, you can catch intermittent failures that only occur during real-world operation.
X-ray or CT scanning: For BGA, CSP, or other hidden components, X-ray imaging can reveal solder voids, cracks, or insufficient wetting that cause intermittent connections under thermal expansion. A CT scan (3D X-ray) can even show the shape of solder joints, highlighting areas where the connection is weak.

Step 5: Root Cause Verification

Once you've identified a suspected cause (e.g., "BGA WiFi module has 20% solder voids"), verify it by fixing the issue and retesting. For example, reflow the BGA to repair the solder joints, then run the thermal cycle test again. If the failure disappears, you've found your culprit. If not, go back to the hypothesis board—intermittent failures sometimes require testing multiple fixes before the root cause is confirmed.

Case Study: The "Thermostat Mystery" Solved

Let's return to Maria and the smart thermostat. After logging failures and confirming they occurred at >35°C, she used a thermal camera to scan the board during operation. The culprit? A small 0603 capacitor near the WiFi module that heated up to 42°C—5°C higher than surrounding components.
X-ray inspection revealed the capacitor had a cold joint (likely from uneven solder paste during SMT assembly). When the board heated up, the capacitor's lead expanded, breaking the already weak connection and cutting power to the WiFi module. Once cooled, the lead contracted, reconnecting the circuit.
The fix? Re-soldering the capacitor with a small amount of additional solder paste. After rework, the thermostat passed 50 thermal cycles (30°C to 50°C) without failure. To prevent future issues, Maria's team also reviewed their smt pcb assembly process, adjusting solder paste deposition for 0603 components to ensure better wetting.

Preventing Intermittent Failures: Proactive Practices

While diagnosing intermittent failures is satisfying, preventing them is even better. Here are three proactive strategies manufacturers can implement:

1. Strengthen Manufacturing Quality Control

Invest in automated inspection tools for smt pcb assembly and dip soldering service lines. AOI (Automated Optical Inspection) can catch tombstoning or solder balling in SMT components, while AXI (Automated X-ray Inspection) flags BGA/CSP voids before boards leave the factory. For dip soldering, use in-line vision systems to check for cold joints or insufficient solder fillets on through-hole components.

2. Improve Component Management

Electronic component management software isn't just for tracking inventory—it's a quality control tool. Use it to:
  • Flag components with high failure rates or supplier quality issues
  • Track batch numbers to quickly recall boards if a component lot is defective
  • Store datasheets and thermal profiles to ensure components are used within their rated limits

3. Enhance Conformal Coating and Environmental Protection

A well-applied pcb conformal coating is a first line of defense against moisture and dust. Use automated spray systems for even coverage, and inspect coatings with UV light (if using UV-curable materials) to check for pinholes or thin spots. For high-humidity applications, consider parylene coating—a thin, pinhole-free polymer that conforms to component shapes better than traditional epoxy or acrylic coatings.

4. Integrate Stress Testing into the PCBA Testing Process

Don't wait for customers to find intermittent failures—test for them during manufacturing. Add thermal cycling, vibration, and humidity testing to your pcba testing process , especially for critical components like BGAs or connectors. Even a short 10-cycle thermal test (0°C to 60°C) can uncover weak solder joints before products ship.

Conclusion: Turning Frustration into Reliability

Intermittent PCBA failures are frustrating, but they're not unbeatable. By approaching them as detective work—gathering clues, testing hypotheses, and using the right tools—engineers can uncover even the most hidden defects. And by focusing on prevention—strengthening assembly processes, using electronic component management software to track quality, and enhancing pcb conformal coating —manufacturers can build PCBAs that work reliably, no matter what conditions they face.
At the end of the day, the goal isn't just to fix a single faulty board. It's to build a culture of quality—one where intermittent failures are seen not as nuisances, but as opportunities to make better products. And that's a win for engineers, manufacturers, and customers alike.
Previous: How to Identify PCBA Defects During Testing Next: Common Causes of Test Failures and How to Fix Them
Get In Touch with us

Hey there! Your message matters! It'll go straight into our CRM system. Expect a one-on-one reply from our CS within 7×24 hours. We value your feedback. Fill in the box and share your thoughts!

Get In Touch with us

Hey there! Your message matters! It'll go straight into our CRM system. Expect a one-on-one reply from our CS within 7×24 hours. We value your feedback. Fill in the box and share your thoughts!