OTGW system error

This Forum is about the Opentherm gateway (OTGW) from Schelte

Moderator: hvxl

AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

Wow. Brilliant. Many thanks. I will give it a try.
AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

Ok. I have flashed to v5.3.1 and it is running now. :)

I will continue logging for a couple of weeks and/or until another crash occurs, and I will report back in any case.

Obviously the OTMonitor 'Transfer EEPROM settings' checkbox was disabled, so the settings under the new v5.3.1 test run might be slightly different than before. As far as I recall, my only customisations were a) the LED and GPIO assignments, and b) sending an AA=26 command, which I have replicated in v5.3.1. But I am not 100% sure if I may have had something else in v5.3 that was not taken over. Who knows but maybe there was some other setting that may have contributed to the crashes, and that now not being present may cause the problem to have unwittingly been resolved. ??

Obviously also, having done a re-flash, it may coincidentally have resolved your hypothesis concerning a stuck EEPROM bit. Lets see..
AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

PS apropos the cause of the crashes: My OTGW is the one from Nobo, installed in their polycarbonate box, mounted below my Viessmann boiler, so I wonder if EMI from the boiler's spark igniter could possibly be the cause of the crashes. Did you ever experience any such EMI issues before? Perhaps I should put it it a metal box?
hvxl
Senior Member
Senior Member
Posts: 1959
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OTGW system error

Post by hvxl »

AndrewFG wrote:Obviously the OTMonitor 'Transfer EEPROM settings' checkbox was disabled
No, that's not obvious. You must not have used the latest OTmonitor.
Schelte
AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

hvxl wrote: No, that's not obvious. You must not have used the latest OTmonitor.
You are right. I was using OTMonitor v5.2. But anyway the flash is done now, so any odd EEPROM settings will have been lost..
AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

code for the PR=Q command to report when the last reset was due a WDT time-out
That’s certainly useful. But (just a polite idea) it might help even more if there was some way to capture the stack-trace just prior to the error. I don’t know anything about this processor or the available libraries, so I have no idea if that is even possible. So please ignore me if it is not..
hvxl
Senior Member
Senior Member
Posts: 1959
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OTGW system error

Post by hvxl »

Quote from the data sheet: "The stack space is not part of either program or data space and the Stack Pointer is not readable or writable." So unfortunately it's not possible to get a stack trace. Maybe if you run it with an in-circuit debugger. But that is going to be difficult with a problem that only happens once every couple of weeks.
Schelte
AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

hvxl wrote:not possible to get a stack trace
Ok.
AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

I don't know if it is relevant, but in my ongoing logging, I noticed this morning the following 9 messages. The OpenHAB binding considered them to be 'unknown' messages, so they do stand out. This sequence of 9 messages was repeated three times. However the OTGW did NOT go into 'WDT reset'. And after the third repetition, everything went back to normal operation.

Code: Select all

2022-03-22 08:51:47.741 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - Received message: 7409F5, (unknown)
2022-03-22 08:51:47.743 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - Received message: 00E6400, (unknown)
2022-03-22 08:51:47.744 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - Received message: 00E6400, (unknown)
2022-03-22 08:51:47.745 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - Received message: 0780000, (unknown)
2022-03-22 08:51:47.746 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - Received message: 078163F, (unknown)
2022-03-22 08:51:47.747 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - Received message: 0000100, (unknown)
2022-03-22 08:51:47.748 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - Received message: 000010A, (unknown)
2022-03-22 08:51:47.749 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - Received message: 0190000, (unknown)
2022-03-22 08:51:47.751 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - Received message: 0193A4D, (unknown)
hvxl
Senior Member
Senior Member
Posts: 1959
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OTGW system error

Post by hvxl »

Considering how the OTGW works internally, it is highly unlikely that it would fail in this way. So, until proven otherwise, I'm blaming this on the OpenHAB binding or the USR-TCP232.
Schelte
AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

It is certainly not the binding (it has no way to 'imagine' reading unknown rubbish). But it might indeed be the USR-TCP. But I also have no way know however..
hvxl
Senior Member
Senior Member
Posts: 1959
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OTGW system error

Post by hvxl »

It's not exactly unknown rubbish. These look like regular messages with the first 2 characters missing.
Schelte
AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

^
Indeed. And the messages seem to come pair-wise starting with xxx00 and following by xxxNN with the last two characters different. So there is certainly some structure there.

If you think the USR-TCP might be responsible for dropping the first characters, I have the following questions..
  • - Have you any idea why it might do that?
    - Can you imagine any way that such an error in sending outgoing messages might cause an error on incoming messages, that might cause a WDT reset? (Personally I cannot imagine such..)
    - Just an off the wall idea: does the OTGW terminate its sent messages with \r or with \r\n? I wonder if extra \n characters may be causing proper message characters to be lost?? (But again I can’t imagine how that might influence WDT reset..)
hvxl
Senior Member
Senior Member
Posts: 1959
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OTGW system error

Post by hvxl »

As I said, otherwise normal messages. You get those in pairs as well: First a request (R) and then a response (B).

I think this is in no way related to the WDT reset.
Schelte
AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

Arrgh! The OTGW just crashed again. This time, it sent two garbage messages (see log below) and then simply stopped working. There was no WDT reset. The boiler console showed an OpenTherm comms error, so the OTGW was communicating neither on the OT bus nor on the serial side. And I had to power cycle the unit.

Code: Select all

2022-04-02 06:24:10.305 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CS=50.0
2022-04-02 06:24:10.337 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CS: 50.00
2022-04-02 06:24:10.338 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CH=1
2022-04-02 06:24:10.357 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CH: 1
2022-04-02 06:24:10.582 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-04-02 06:24:10.729 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-04-02 06:24:11.609 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-04-02 06:24:11.738 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40740CD6
2022-04-02 06:24:12.627 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00110000
2022-04-02 06:24:12.837 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0110000
2022-04-02 06:24:13.661 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R001B0000
2022-04-02 06:24:13.851 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B601B0000
2022-04-02 06:24:14.686 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-04-02 06:24:14.861 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078166C
2022-04-02 06:24:15.712 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R900E6400
2022-04-02 06:24:15.864 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B500E6400
2022-04-02 06:24:16.740 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-04-02 06:24:16.881 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B401A29CD
2022-04-02 06:24:17.762 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-04-02 06:24:18.084 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000102
2022-04-02 06:24:18.791 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-04-02 06:24:19.090 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40192C19
2022-04-02 06:24:19.817 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-04-02 06:24:20.099 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-04-02 06:24:20.842 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80380000
2022-04-02 06:24:21.107 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0383700
2022-04-02 06:24:21.871 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00390000
2022-04-02 06:24:22.122 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0394100
2022-04-02 06:24:22.897 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-04-02 06:24:23.124 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40740CD6
2022-04-02 06:24:23.915 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-04-02 06:24:24.137 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078166C
2022-04-02 06:24:24.939 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-04-02 06:24:25.246 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000102
2022-04-02 06:24:25.976 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-04-02 06:24:26.254 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0192BE6
2022-04-02 06:24:26.997 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-04-02 06:24:27.262 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-04-02 06:24:28.021 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-04-02 06:24:28.470 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read:   �     $ ��          �        @      0                                                �z�� @)��  D �yX�   �  ��1f � �� ��
2022-04-02 06:24:28.470 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - Received message:   �     $ ��          �        @      0                                                �z�� @)��  D �yX�   �  ��1f � �� ��, (unknown)
2022-04-02 06:24:28.472 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read:    �A
2022-04-02 06:24:28.472 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - Received message:    �A, (unknown)
2022-04-02 06:24:40.306 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CS=50.0
2022-04-02 06:24:48.492 [WARN ] [rnal.OpenThermGatewaySocketConnector] - Error communicating with OpenTherm Gateway: 'Read timed out'
2022-04-02 06:24:48.500 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - OpenThermGatewaySocketConnector disconnected
2022-04-02 06:25:48.505 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - Stopping OpenThermGatewaySocketConnector
Post Reply

Return to “Opentherm Gateway Forum”