Rabbit holes in HW Lab automation

This post is a reminder about rabbit holes one goes down when working on technical problems.

Some time ago, I received an Inforce IFC6410 to add to my testing while I modernize the thermal sensor driver found on many Qualcomm platforms. This is a fairly old Snapdragon 600 SoC and isn't being tested in any of the kernelci.org labs leading to a patch recently getting merged that broke the thermal sensor on the board.

I figured I'd wire it up to my board automation setup - a glorified mess of an ATX PSU, a switchable USB hub and an 8-port relay - that automates the toggling of power and vbus to a board to allow for automated kernel flashing and testing.

HW Lab
HW Lab automation - work in progress
So I wired up the IFC6410, connected a USB-serial cable for console and added the requisite port numbers to my scripts to toggle power to the board when I built a kernel to test on that target.

Except that the board didn't enter fastboot mode to flash a new kernel. The messages on the serial console stopped at

Android Bootloader - UART_DM Initialized!!!
[0] welcome to lk

[10] platform_init()
[10] target_init()
[20] display_init(),target_id=3948.
After ensuring that the DIP switch settings were correct, I started suspecting my wiring. After triple checking the wiring and the voltages at the barrel plug, I was no closer to finding out what was happening.

After a break, I got the idea to try out another power supply, so I plugged it into a Rigol power supply. The board booted up to fastboot mode!

Hmm, still some issues in the wiring? Or perhaps the ATX Breakout Board? After checking if all my boards might be drawing more current that the breakout out board could supply and checking my wire guage, I was still stumped.

The only other thing that stuck out now was that the board was being powered through a relay. Connecting the board directly to the output of the ATX PSU made it boot up. Finally, a clue!

At this point, I'd spent more time than I had budgeted on this little automation project. So I shelved it for a few days. In the meanwhile, I happened to speak to a friend who suggested checking the voltage on a scope. So the next day, he brought over his scope.

We first attached a capacitor across the voltage to the board and that made things work, even through the relay - only half the time. Same thing with a capacitor at the output of the ATX breakout board.

Finally it was time to connect the oscilloscope to the output of the relay. The first picture sent me back to the theory of relays taught during Computer Engineering - the settling time for voltage in an electromechanical relay.


That's a fair amount of vibration before the relay switches and settles down to its new position and outputs 5 Volts. So we start booting up the SoC and then the voltage drops precipitously causing boot to freeze. Other relays on the relay board showed similar characteristics. So it isn't a bad relay. And other boards work fine, so it isn't necessarily an issue with using relays.



Adding a capacitor adds a delay to the relay switching while the capacitor charges up but doesn't really fix the problem.

In the end, I've concluded that this is actually a PMIC issue. Most PMICs have some sort of a POWER GOOD signal that needs to be stable for some time before it tries to power on the rest of the SoC. IMO, on this board, the firmware is trying to boot the SoC too soon. My suspicion is that this could be fixed in firmware but given the age of the SoC and board, I'm not going to spend too much time on this. The rest of the boards from Qualcomm work just fine with this relay board, so things did get fixed in future PMICs and firmware. Perhaps I'll find time to verify the waveform on a scope for the other boards.

I also found a forum post from 2015 alluding to the same problem with a similar setup. Note the ATX power supply - I bet they were using relays too. :-)

Time to climb out of the rabbit hole.

What do you think? Have I reached a valid conclusion?

Comments

Popular posts from this blog

Prolonging the battery life on your laptop/netbook running Ubuntu

Touchscreen = fail?

Speeding up boot on an upgraded system