Why My Crystal Does Not Start Up and How it is Related, Surprisingly, to the MCU Itself
Well, as all of us know, the first and basic input that the MCU or MPU should get after it is powered up is a clock source.
There are several options to feed an MCU with an external clock source: RC circuits, ceramic resonators, crystals (also known as quartz crystals), crystal oscillators, and silicon/MEMS oscillator modules.
The optimal clock source for an application depends on many factors including cost, accuracy, power consumption, environmental parameters, etc.
I want to discuss here the usage of ceramic resonators or crystals (for simplicity, a ceramic resonator is actually the crystal with built-in capacitors, though quartz crystals are more accurate and temperature stable than ceramic resonators). Sometimes the crystals are referred to as XTAL.
Crystals are widely used in cost-sensitive applications because of the obvious reason of being rather cheap. However, they also have precision and high frequency stability and are used in many applications where the cost is not a main issue.
Many MCU vendors have their own Application Notes guiding the designers on how to correctly connect the crystal to the MCU, how to choose the correct values for the capacitors and resistor, explain PCB layout concerns, etc.
Eventually, at the end of the design process, testing, and field trials, you are OK with “good engineering” design. Everything is working fine, and the product goes to production usually with the contract manufacturer (CM) somewhere in the world.
And then after some time, that can be 6 months or several years, you get a call from your CM that the product does not pass a go/no go test after the assembly process. After the investigation, which takes time, you find out the reason is that the crystal does not start up and the MCU is not running.
You scratch your head and think, “What the hell is going on here?” You had already forgotten about this product, you are deeply involved in a new one, you do not remember what was designed there, the designer is not working for you any longer, and of course you do not have time – a real mess!
This phenomenon does not happen often. I am aware of about a dozen cases in the last 10 years, but I assume there are more cases that I am not aware of.
Actually, the reason is not the crystal, but… the MCU and the process that is known as a “die shrink”. As a basic definition, the act of shrinking a die is to create an identical circuit to the semiconductor IC using a more advanced fabrication process and reduce the transistor/gate size and distance of the interconnects. The die shrink process allows more processor dies to be manufactured on the same piece of silicon wafer, resulting in less cost per product.
Die shrinks are also beneficial to end-users as shrinking a die reduces the current used by each transistor while maintaining the same clock frequency of a chip. This results in a product with lower power consumption and an increased clock rate. It also allows the vendors to implement additional features of the IC that could not be implemented before the die shrink.
Die shrinks are the key to improving price/performance at semiconductor companies and it happens once a period. This period can be once a year or once every several years.
Figure 1: The Electric Equivalent Circuit of a Crystal
But what does not change in this “die shrink” process is the package itself where the die is mounted. The distance between the die pad and the package pins becomes longer. When the trace length between the pad and the pin becomes electrically “longer”, it affects the impedance and resistance (ESR) of the load, and faster electron speeds in the shrunken die affects the inductance.
The bigger package an MCU has, the more chances it can be affected by a die shrink. The crystal behaves like an RLC circuit, and without diving into the calculations, these R, L, and C parameters after the die shrink process can affect the value of the capacitors and/or resistors that were chosen and tested during the initial design stage before the die shrink.
This phenomenon is similar to the situation when during the design process your crystal does not start up. However, when you want to check what is going on and touch the pins with the scope probes, it suddenly begins to oscillate. The probe adds additional marginal capacitance that might be lacking to start up the crystal.
So what is the solution?
Actually, not much can be done when this problem occurs, but the first importance is the awareness of the problem. Also, the fireman used to say, “the fastest way to extinguish the fire is to prevent it.” So, if adding another 20 – 40 cents to overall system cost can be tolerable, the recommendation is to use a simple crystal oscillator instead of a quartz crystal itself. Crystal oscillators are a completely integrated solution. The oscillator manufacturers match the quartz resonator to the oscillator circuit, thus relieving the board designer of the matching burden. Also, crystal oscillators have many other benefits like robust operation, lower sensitivity to EMI and vibration, and guaranteed start-up – just to mention a few of them.
Another solution is maintenance. The vast majority of the reliable vendors issue a PCN when they perform any change to the part including the die shrink. Some of them even add an additional suffix to the p/n to distinguish between the die changed parts. So by monitoring the PCNs for a die shrink, the designer can take the board with the old design and a new die shrunken MCU and test for proper operation again in advance in their “leisure time”. If a problem is detected, the product can be modified to prevent any last-minute unexpected issues in mass production, especially if the production is done by the CM in other country or place.