OTGW keeps resetting with reason 'stuck in a loop'

This Forum is about the Opentherm gateway (OTGW) from Schelte

Moderator: hvxl

Post Reply
fcdgw2
Starting Member
Starting Member
Posts: 2
Joined: Mon May 23, 2022 9:47 pm

OTGW keeps resetting with reason 'stuck in a loop'

Post by fcdgw2 »

I connected the OTGW to my AWB boiler, without thermostat. Almost everytime after sending MessageID 57 to the boiler, the communication stops. When the watchdog resets the gateway, the last reset reason is 'stuck in a loop'. The logging is shown below.

By looking at the LED's (LED C configured as B and LED D configured as T), there seems to be a message which is sent to the thermostat after transmitting and receiving MessageID 57, while there isn't a thermostat connected.

Code: Select all

21:26:19.592816	OpenTherm Gateway 5.3
21:26:19.634884	Thermostat disconnected
21:26:20.563054	R00000000	Read-Data 	Status: 00000000 00000000
21:26:20.658087	BC0000000	Read-Ack  	Status: 00000000 00000000
21:26:21.581012	R80190000	Read-Data 	Boiler water temperature: 0.00
21:26:22.571315	R80190000	Read-Data 	Boiler water temperature: 0.00
21:26:22.681723	B40191700	Read-Ack  	Boiler water temperature: 23.00
21:26:23.596453	R10010000	Write-Data	Control setpoint: 0.00
21:26:23.691569	BD0010000	Write-Ack 	Control setpoint: 0.00
21:26:24.615230	R00060000	Read-Data 	Remote parameter flags: 00000000 00000000
21:26:25.603912	R00060000	Read-Data 	Remote parameter flags: 00000000 00000000
21:26:25.712486	B40060301	Read-Ack  	Remote parameter flags: 00000011 00000001
21:26:26.619393	R00110000	Read-Data 	Relative modulation level: 0.00
21:26:26.720131	BC0110000	Read-Ack  	Relative modulation level: 0.00
21:26:27.650908	R001B0000	Read-Data 	Outside temperature: 0.00
21:26:28.640688	R001B0000	Read-Data 	Outside temperature: 0.00
21:26:28.735172	B601B0000	Data-Inv  	Outside temperature: 0.00
21:26:29.652226	R801C0000	Read-Data 	Return water temperature: 0.00
21:26:29.763061	B401C1800	Read-Ack  	Return water temperature: 24.00
21:26:30.679113	R900E6400	Write-Data	Maximum relative modulation level: 100.00
21:26:30.774015	B500E6400	Write-Ack 	Maximum relative modulation level: 100.00
21:26:31.706056	R00300000	Read-Data 	DHW setpoint boundaries: 0 0
21:26:31.799355	B40303F26	Read-Ack  	DHW setpoint boundaries: 63 38
21:26:32.714280	R00000000	Read-Data 	Status: 00000000 00000000
21:26:32.825221	BC0000000	Read-Ack  	Status: 00000000 00000000
21:26:33.737059	R80190000	Read-Data 	Boiler water temperature: 0.00
21:26:34.736530	R80190000	Read-Data 	Boiler water temperature: 0.00
21:26:34.830195	B40191700	Read-Ack  	Boiler water temperature: 23.00
21:26:35.745545	R10010000	Write-Data	Control setpoint: 0.00
21:26:35.854108	BD0010000	Write-Ack 	Control setpoint: 0.00
21:26:36.769659	R80380000	Read-Data 	DHW setpoint: 0.00
21:26:37.762926	R80380000	Read-Data 	DHW setpoint: 0.00
21:26:37.873063	BE0383F00	Data-Inv  	DHW setpoint: 63.00
21:26:38.786700	R00390000	Read-Data 	Max CH water setpoint: 0.00
21:26:38.882324	B40394900	Read-Ack  	Max CH water setpoint: 73.00
21:26:57.286287	Command: PR=Q
21:26:57.343862	PR: Q=L
The issue seems to be in the code below. When I edit the the line "movlw 'T'" to "movlw 'M'", the LED doens't blink any more, while the maintenance LED starts blinking, so I'm pretty sure the issue appears in this part of code. For some reason the MsgResponse bit seams to be cleared somewhere.

Code: Select all

SendMessage     movlw   1
                movwf   quarter         ;Initialize the state counter
                bsf     NextBit         ;Start bit is 1
                bsf     Transmit        ;Starting to transmit a message
                movlw   'X'             ;Transmit function
                pagesel Message
                call    SwitchOnLED     ;Switch on the transmit LED
                movlw   'T'             ;Thermostat function
                btfss   MsgResponse
SendBoiler      movlw   'B'             ;Boiler function
                ; In package Message
                call    SwitchOnLED     ;Switch on the boiler or thermostat LED
                movlw   SlaveMask       ;Transmitting to the boiler
                btfsc   MsgResponse     ;Check message direction
                movlw   MasterMask      ;Transmitting to the thermostat
                movwf   outputmask
                movlw   34              ;A message is 32 bits + start & stop bit
                movwf   bitcount        ;Load the counter
                bsf     STATUS,RP0
                bsf     PIE1,TMR2IE     ;Enable timer 2 interrupts
                bcf     STATUS,RP0
                movlw   PERIOD
                movwf   TMR2            ;Prepare timer 2 to overflow asap
                bsf     T2CON,TMR2ON    ;Start timer 2
Please let me know if I can do some additional testing for clearing up this issue.
hvxl
Senior Member
Senior Member
Posts: 1965
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OTGW keeps resetting with reason 'stuck in a loop'

Post by hvxl »

Thank you for the detailed bug report and the extensive investigation. I actually ran into this issue myself a few days ago and I am working on a fix.

When the OTGW is used without a thermostat, it sends a fixed sequence of messages. In this sequence there are also some slots reserved for user messages. For these slots it looks through the list of alternative messages and picks the next one in line. The problem happens when the list of alternative messages is empty. Initially, the list of alternatives contains messages 116 through 123. But if your boiler doesn't support any of those messages, the OTGW will gradually remove them from the list. After a while this may result in an empty list. The list is stored in EEPROM, so once it becomes empty it stays empty unless it is manually filled again.

With that knowledge, there's a simple work-around: Make sure the list of alternatives doesn't become empty. This can be achieved by adding a message that is known to be supported by your boiler. Your logs show that the boiler responds to MsgID 25, so that would be a possible candidate. You can add it with an AA=25 command.


TS;WM (too short; want more):

There are actually multiple issues with the code in case of an empty list of alternatives in stand-alone mode:
  • The OTGW sends an invalid message.
  • It doesn't report the message it sends.
You correctly identified a part of the code that is involved in bad behavior. The cause of the problem is actually here:

Code: Select all

RestoreRequest  bcf     AlternativeUsed ;Not providing an alternative after all
                ;At this point W = 0
CreateMessage   iorwf   originaltype,W  ;Combine W and the original message type
                movwf   byte1           ;Set the message type for the response
                movfw   databyte1
                movwf   byte3           ;Restore data byte #1
                movfw   databyte2
                movwf   byte4           ;Restore data byte #2
                movfw   originalreq
                movwf   byte2           ;Restore the message ID
                return
When the OTGW restores the request, it expects that the message type is stored in the originaltype variable. This variable is filled when a message is received from the thermostat. Of course, in stand-alone mode that never happens and this variable just contains whatever random data happens to be there at power up. As a result, the first byte of the message will be filled with that random data. This may include a set MsgResponse bit, causing the T LED to blink. In most cases, the boiler won't understand the message and ignore it. So the OTGW repeats the message, with the same result. After a minute, the OTGW concludes that something is wrong and reboots itself (this is not a WDT reset). This can be fixed by clearing the originaltype variable somewhere. For example here:

Code: Select all

                movlw   T_READ
                movwf   byte1
                clrf    byte3
                clrf    byte4
                clrf    originaltype
                bcf     SendUserMessage
                call    SelectMessage   ;Get a message to send from the table
Unfortunately, I didn't catch this with the simulator because that always initializes the RAM to 0.

The next issue is the failure of the OTGW to report the message it is sending. When a thermostat is connected, the received message has already been reported. So there is no need to report it again. In stand-alone mode, that is not the case. This can be fixed this way:

Code: Select all

SendDefaultMsg  btfsc   NoThermostat    ;Is a thermostat connected?
                goto    SetParity       ;Calculate the parity of the new message
                return                  ;Send message unmodified to the boiler
SendAltRequest  bsf     AlternativeUsed ;Probably going to send a different msg
                bcf     OverrideUsed    ;Changes will be more drastic
                call    Alternative     ;Get the alternative message to send
                btfss   AlternativeUsed
                goto    SendDefaultMsg  ;There were no other candidates
I will do some more testing to make sure the changes don't cause any new bugs. But you may already want to implement the changes to fix the problem in your setup.
Schelte
fcdgw2
Starting Member
Starting Member
Posts: 2
Joined: Mon May 23, 2022 9:47 pm

Re: OTGW keeps resetting with reason 'stuck in a loop'

Post by fcdgw2 »

Thank you for your extensive reply! I did some testing yesterday with the additional code, and I have stable communication now! :)

I also tried to view the content of byte 1 till byte 4 with the debug pointer, but couldn't find which file register i have to check. Do you have an overview of the adresses?
hvxl
Senior Member
Senior Member
Posts: 1965
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OTGW keeps resetting with reason 'stuck in a loop'

Post by hvxl »

An overview of the addresses assigned to symbols can be found in the map file produced by the linker (dist/default/production/gateway.X.production.map):
byte1 0x000076
byte2 0x000077
byte3 0x000078
byte4 0x000079
Schelte
hvxl
Senior Member
Senior Member
Posts: 1965
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OTGW keeps resetting with reason 'stuck in a loop'

Post by hvxl »

Firmware versions 6.1 (for PIC16F1847) and 5.4 (for PIC16F88) have been released to fix this issue.
Schelte
Post Reply

Return to “Opentherm Gateway Forum”