OTGW system error

This Forum is about the Opentherm gateway (OTGW) from Schelte

Moderator: hvxl

AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

hvxl wrote:I thought you said the CS command had 8 decimal places of precision. That is not reflected by this log.
Yup. I already changed the binding code to round to a single place of decimals. Just in case.
hvxl
Senior Member
Senior Member
Posts: 1959
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OTGW system error

Post by hvxl »

OK. Then let's see what happens this way.

One interesting thing I noticed from your last posts is that you are sending the CS and CH commands back to back. The OTGW can only handle one command at a time. It relies on the delay introduced by the serial transmission of a character at 9600 baud to process the previous command. I don't expect that parsing the CS command will take 1ms, but this is something the excessive decimals may have an impact on. It's still not correct that the OTGW fails to handle that, but it's a possible scenario to keep in mind. It doesn't explain why you see the problem more frequently on firmware 5.3 than on 5.2.
Schelte
AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

hvxl wrote:sending the CS and CH commands back to back
Aha. Interesting. I think the situation may even be not just a "back to back" problem, but possibly even an "overlapping" problem!!

A Java TCP socket connection has a write buffer and a read buffer, and in this OpenHAB binding there is one thread that writes to the one, and another that reads from the other; and so I think it is possible that two (or more) messages may be written before anything has been read back.

It is really hard to figure out all the timing combinations because there are so many places where buffering or overlapping could possibly occur: OpenHAB write thread => Java JVM client TCP write buffer => router/switch/cloud => USR-TCP232 server TCP read buffer => OTGW RS-232 port RX => OTGW processing => OTGW RS-232 port TX => USR-TCP232 server TCP write buffer => router/switch/cloud => Java JVM client TCP read buffer => OpenHAB read thread; it makes my head hurt :)

But this certainly started me thinking that I may need to change the pipelining model of the OpenHAB binding with specific locks so that no new command is sent until the prior command(s) have been ack'ed (or timed-out). I did not write the binding myself, so I may need to think deeply about this before introducing any changes though.

In the meantime, I have the current binding running with the logging mentioned above; so let's see what we may find..
hvxl
Senior Member
Senior Member
Posts: 1959
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OTGW system error

Post by hvxl »

Both commands are sent on the same socket. TCP will take care that they arrive in the order they were transmitted. The commands will then also be sent over the serial line in that order. So there can not be any overlapping. The OTGW will normally be able to process a command in the time it takes to transmit the first character of the new command at 9600 baud (about 1ms). So it should be ready in time to receive the second command. (This is one reason I chose this low baud rate).

As you will have seen, sending the two commands together usually works fine. But there can possibly be situations where the PIC is already doing something else when the first command arrives. As a result, it may start processing the first command after a delay. The CS command also takes more than average processing time. This may make the PIC late for receiving the second command, resulting in an overrun error. And maybe that's not handled correctly. I'm just speculating. It's difficult to simulate this situation, so it's not easy to verify.
Schelte
AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

hvxl wrote:Both commands are sent on the same socket. TCP will take care that they arrive in the order they were transmitted.
Ah. Many thanks for the helpful clarification.
hvxl wrote:difficult to simulate this situation, so it's not easy to verify.
Thinking about this further, if may be that turning on the logging will have inherently solved the problem. There are two parallel threads in the binding for reading and writing, but they are constrained through the logger's write access to the output file, so the parallel threads are now essentially being force serialized. So I am expecting that we won't find anything wrong now. But I will let it run for week or two to see if anything does appear.
AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

^
Just for info, I have reported this (potential) issue concerning the OpenHAB binding https://github.com/openhab/openhab-addons/issues/12457. And I am working on a solution in the binding code so that commands will be placed in a FIFO queue to ensure that second or subsequent commands are not transmitted to the OTGW until the binding has received the 'ack' message for the prior command. Or if the ack does not arrive, to attempt re-sending the command (until a maximum timeout).
hvxl
Senior Member
Senior Member
Posts: 1959
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OTGW system error

Post by hvxl »

The OTGW should be able to handle it. And we don't even know yet if that actually is the problem.
Schelte
AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

Below is an example of the logging with the new code with the FIFO command queue; you can see that now it does not send the CH command until it has the 'ack' for the CS command.

Code: Select all

2022-03-10 20:03:59.963 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00000000
2022-03-10 20:04:00.132 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000000
2022-03-10 20:04:00.988 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-10 20:04:01.140 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40194266
2022-03-10 20:04:02.011 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R10010000
2022-03-10 20:04:02.147 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BD0011400
2022-03-10 20:04:02.171 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CS=0.0
2022-03-10 20:04:02.198 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CS: 0.00
2022-03-10 20:04:02.202 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CH=0
2022-03-10 20:04:02.221 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CH: 0
2022-03-10 20:04:03.036 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-10 20:04:03.263 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A3E00
2022-03-10 20:04:04.075 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00110000
2022-03-10 20:04:04.274 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0110000
2022-03-10 20:04:05.094 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R001B0000
2022-03-10 20:04:05.279 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B601B0000
2022-03-10 20:04:06.118 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-10 20:04:06.288 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40740698
2022-03-10 20:04:07.142 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R900E6400
2022-03-10 20:04:07.294 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B500E6400
2022-03-10 20:04:08.166 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-10 20:04:08.302 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0781613
2022-03-10 20:04:09.196 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00000000
2022-03-10 20:04:09.400 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000000
2022-03-10 20:04:10.223 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-10 20:04:10.410 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40194266
2022-03-10 20:04:10.531 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CS=0.0
2022-03-10 20:04:10.557 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CS: 0.00
2022-03-10 20:04:10.558 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CH=0
2022-03-10 20:04:10.577 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CH: 0
2022-03-10 20:04:11.248 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R10010000
2022-03-10 20:04:11.416 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BD0011400
2022-03-10 20:04:12.271 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80380000
2022-03-10 20:04:12.424 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0383700
2022-03-10 20:04:13.296 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00390000
2022-03-10 20:04:13.431 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0394100
2022-03-10 20:04:14.324 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-10 20:04:14.546 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B401A3DE6
2022-03-10 20:04:15.353 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-10 20:04:15.556 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40740698
2022-03-10 20:04:16.375 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00000000
2022-03-10 20:04:16.559 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000000
2022-03-10 20:04:17.402 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
hvxl
Senior Member
Senior Member
Posts: 1959
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OTGW system error

Post by hvxl »

I would prefer to just get a log of the original failing case, without changing a whole bunch of things that may leave us in the dark about what the problem actually was.

I understand you simply want it to work. But I will not feel at ease, knowing there's a bug somewhere in the firmware that you worked around for now, but that may come back to bite someone at some point in the future.
Schelte
AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

The OTGW has been running for ca. one week without error, and last night it went into lockout. I am sending CS and CH commands every 30 seconds which means the failure occurred after about 20'000 successful commands. Below is the log for about the last two minutes leading up to the failure. Interesting is the response at 17:29:20.012 just prior to the crash, which I think would normally be the response to a PR=A command, although the binding did NOT send any such PR command!!

Code: Select all

2022-03-17 17:27:50.247 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CS=50.0
2022-03-17 17:27:50.275 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CS: 50.00
2022-03-17 17:27:50.275 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CH=1
2022-03-17 17:27:50.295 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CH: 1
2022-03-17 17:27:50.798 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:27:51.026 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:27:51.816 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80380000
2022-03-17 17:27:52.030 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0383700
2022-03-17 17:27:52.837 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00390000
2022-03-17 17:27:53.040 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0394100
2022-03-17 17:27:53.867 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:27:54.049 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:27:54.889 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:27:55.040 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A66
2022-03-17 17:27:55.912 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:27:56.055 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000102
2022-03-17 17:27:56.937 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:27:57.272 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0192C66
2022-03-17 17:27:57.970 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:27:58.283 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:27:58.991 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:27:59.293 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40740893
2022-03-17 17:28:00.014 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00110000
2022-03-17 17:28:00.297 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0110000
2022-03-17 17:28:01.042 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R001B0000
2022-03-17 17:28:01.306 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B601B0000
2022-03-17 17:28:02.064 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:28:02.303 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:28:03.088 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R900E6400
2022-03-17 17:28:03.306 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B500E6400
2022-03-17 17:28:04.118 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:28:04.314 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A66
2022-03-17 17:28:05.138 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:28:05.323 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000102
2022-03-17 17:28:06.164 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:28:06.440 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40192C19
2022-03-17 17:28:07.182 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:28:07.449 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:28:08.207 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80380000
2022-03-17 17:28:08.458 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0383700
2022-03-17 17:28:09.232 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00390000
2022-03-17 17:28:09.466 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0394100
2022-03-17 17:28:10.247 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CS=50.0
2022-03-17 17:28:10.261 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:28:10.277 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CS: 50.00
2022-03-17 17:28:10.278 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CH=1
2022-03-17 17:28:10.297 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CH: 1
2022-03-17 17:28:10.474 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40740893
2022-03-17 17:28:11.282 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:28:11.483 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:28:12.307 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:28:12.492 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000102
2022-03-17 17:28:13.332 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:28:13.483 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0192BE6
2022-03-17 17:28:14.356 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:28:14.605 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:28:15.381 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:28:15.611 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A66
2022-03-17 17:28:16.401 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00110000
2022-03-17 17:28:16.621 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0110000
2022-03-17 17:28:17.427 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R001B0000
2022-03-17 17:28:17.628 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B601B0000
2022-03-17 17:28:18.452 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:28:18.636 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40740893
2022-03-17 17:28:19.479 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R900E6400
2022-03-17 17:28:19.646 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B500E6400
2022-03-17 17:28:20.502 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:28:20.654 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:28:21.526 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:28:21.758 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000102
2022-03-17 17:28:22.560 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:28:22.761 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0192BB3
2022-03-17 17:28:23.584 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:28:23.770 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:28:24.609 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80380000
2022-03-17 17:28:24.778 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0383700
2022-03-17 17:28:25.635 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00390000
2022-03-17 17:28:25.791 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0394100
2022-03-17 17:28:26.665 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:28:26.802 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A66
2022-03-17 17:28:27.686 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:28:27.897 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40740893
2022-03-17 17:28:28.725 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:28:28.907 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000102
2022-03-17 17:28:29.752 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:28:29.916 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0192B80
2022-03-17 17:28:30.779 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:28:30.931 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:28:31.805 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:28:31.941 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:28:32.830 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00110000
2022-03-17 17:28:33.053 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0110000
2022-03-17 17:28:33.849 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R001B0000
2022-03-17 17:28:34.045 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B601B0000
2022-03-17 17:28:34.869 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:28:35.059 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A66
2022-03-17 17:28:35.894 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R900E6400
2022-03-17 17:28:36.069 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B500E6400
2022-03-17 17:28:36.928 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:28:37.080 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40740893
2022-03-17 17:28:37.946 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:28:38.085 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4000010A
2022-03-17 17:28:38.971 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:28:39.205 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0192D19
2022-03-17 17:28:39.991 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:28:40.209 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:28:41.016 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80380000
2022-03-17 17:28:41.221 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0383700
2022-03-17 17:28:42.041 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00390000
2022-03-17 17:28:42.208 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0394100
2022-03-17 17:28:43.064 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:28:43.217 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:28:44.089 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:28:44.226 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A66
2022-03-17 17:28:45.117 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:28:45.344 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4000010A
2022-03-17 17:28:46.135 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:28:46.357 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B401932CD
2022-03-17 17:28:47.162 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:28:47.362 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:28:48.186 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:28:48.371 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0740894
2022-03-17 17:28:49.216 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00110000
2022-03-17 17:28:49.383 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0110000
2022-03-17 17:28:50.241 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R001B0000
2022-03-17 17:28:50.243 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CS=50.0
2022-03-17 17:28:50.269 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CS: 50.00
2022-03-17 17:28:50.270 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CH=1
2022-03-17 17:28:50.289 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CH: 1
2022-03-17 17:28:50.393 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B601B0000
2022-03-17 17:28:51.260 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:28:51.490 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:28:52.294 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R900E6400
2022-03-17 17:28:52.500 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B500E6400
2022-03-17 17:28:53.320 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:28:53.506 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A33
2022-03-17 17:28:54.346 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:28:54.520 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4000010A
2022-03-17 17:28:55.371 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:28:55.524 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0193980
2022-03-17 17:28:56.397 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:28:56.626 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:28:57.432 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80380000
2022-03-17 17:28:57.634 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0383700
2022-03-17 17:28:58.457 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00390000
2022-03-17 17:28:58.642 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0394100
2022-03-17 17:28:59.483 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:28:59.652 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0740894
2022-03-17 17:29:00.508 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:29:00.661 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:29:01.533 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:29:01.673 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4000010A
2022-03-17 17:29:02.559 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:29:02.774 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40193933
2022-03-17 17:29:03.594 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:29:03.779 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:29:04.621 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:29:04.788 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A66
2022-03-17 17:29:05.648 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00110000
2022-03-17 17:29:05.797 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0110000
2022-03-17 17:29:06.669 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R001B0000
2022-03-17 17:29:06.805 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B601B0000
2022-03-17 17:29:07.695 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:29:07.928 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0740894
2022-03-17 17:29:08.720 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R900E6400
2022-03-17 17:29:08.931 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B500E6400
2022-03-17 17:29:09.738 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:29:09.939 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:29:10.244 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CS=50.0
2022-03-17 17:29:10.270 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CS: 50.00
2022-03-17 17:29:10.271 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CH=1
2022-03-17 17:29:10.291 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CH: 1
2022-03-17 17:29:10.765 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:29:10.930 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4000010A
2022-03-17 17:29:11.792 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:29:11.945 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0193866
2022-03-17 17:29:12.811 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:29:12.948 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:29:13.836 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80380000
2022-03-17 17:29:14.064 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0383700
2022-03-17 17:29:14.855 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00390000
2022-03-17 17:29:15.074 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0394100
2022-03-17 17:29:15.881 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:29:16.085 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A33
2022-03-17 17:29:16.907 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:29:17.097 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0740894
2022-03-17 17:29:17.934 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:29:18.101 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4000010A
2022-03-17 17:29:18.954 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:29:19.107 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0193A19
2022-03-17 17:29:20.012 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: OpenTherm Gateway 5.3
2022-03-17 17:29:20.013 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - Received message: OpenTherm Gateway 5.3, (unknown)
2022-03-17 17:29:20.015 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: WDT reset!
2022-03-17 17:29:20.015 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - Received message: WDT reset!, (unknown)
2022-03-17 17:29:40.027 [WARN ] [rnal.OpenThermGatewaySocketConnector] - Error communicating with OpenTherm Gateway: 'Read timed out'
2022-03-17 17:29:40.038 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - OpenThermGatewaySocketConnector disconnected
hvxl
Senior Member
Senior Member
Posts: 1959
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OTGW system error

Post by hvxl »

The log shows that the issue is unrelated to the CS command. The WDT reset happens 10 seconds after the last CS command. The watchdog timer runs out after 16ms. So the changes you made were barking up the wrong tree.

In fact, the logs show nothing special going on at all before the WDT reset happens. It's just plodding along, doing exactly the same things that many other OTGWs also do. And nobody else has reported a problem. Do you happen to have anything configured on the GPIO pins? For example, reading a temperature sensor happens in the background. So a potential problem there would not show up in the logs. In stand-alone mode, a temperature sensor has no added value, so I would suggest to disable it.

You indicated that the problem happens more frequently on v5.3 than on v5.2. However, the only change between the two version is in the handling of serial commands. But there is no serial command being performed when your OTGW resets. So, I'm starting to wonder if perhaps on V5.2 you saw the problem that was fixed. And on v5.3 there is a bit somewhere in the code that is not fully binary. Most of the time it reads correctly. But sometimes it gives a 0 instead of a 1, or the other way around. That causes the program to derail, which is then caught by the watchdog. This is just speculation, but the best idea I can come up with at the moment. If it is what is happening, it should hopefully be resolved by reflashing the firmware once more.

The "OpenTherm Gateway 5.3" is the normal banner the OTGW sends after a reset. The "WDT reset!" message is reported after the reset happens. And also after the banner is printed. So that is normal in case of a WDT reset. (Although a WDT reset is not normal.) You can tell it's the startup banner because the response to a PR=A command has some additional characters at the start: "PR: A=OpenTherm Gateway 5.3".
Schelte
AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

> anything configured on the GPIO pins?

I have GPIO ports A & B mapped to LED E & F respectively, which are flashing on Transmitting & Receiving an OpenTherm Message respectively. Maybe that is too much GPIO / LED activity? Perhaps I should connect the GPIO's to 'none' instead ??

> indicated that the problem happens more frequently on v5.3 than on v5.2

I think that was a chance observation. On v5.2 I got one failure in two weeks. The on v5.3 I got three failures in one week. But with the logging enabled on v5.3 I got one failure in one week. So the difference is marginal.

> hopefully be resolved by reflashing the firmware

That sounds highly unlikely, but I can try it. I will let you know.

You also suggested an alternate 'solution' that might still work -- namely connecting 'Boiler Maintenance' to GPIO A or B, and running a jumper wire to the RST pin. Is that really likely to work? I sometimes observe a fast blink on the LEDs when power cycling the OTGW, so I fear that that might cause a perpetual spiral of self RST...
hvxl
Senior Member
Senior Member
Posts: 1959
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OTGW system error

Post by hvxl »

Having LEDs configured on the GPIO pins should not be a problem. They don't need complicated background processing.

I agree that an unstable bit is highly unlikely. But I can't think of any more likely reason, given the point at which it fails and the amount of other people that use the same functionality without problems.

After a reset, only LEDs A, B, C, and D flash. That's why I suggested to use one of the GPIO ports. But I haven't tested it. It may be easier to modify the firmware to not go into bare minimum mode after a WDT reset. Just report it and resume normal operation.
Schelte
AndrewFG
Starting Member
Starting Member
Posts: 49
Joined: Fri Jan 07, 2022 7:50 pm

Re: OTGW system error

Post by AndrewFG »

> may be easier to modify the firmware

Crikey! The last time I wrote anything in assembler was for the 6502 literally decades ago..
hvxl
Senior Member
Senior Member
Posts: 1959
Joined: Sat Jun 05, 2010 11:59 am
Contact:

Re: OTGW system error

Post by hvxl »

In this case it's not so much writing assembly code, but deleting it. To create the attached firmware, I deleted the parts where, after a WDT reset, the code:
  • Configures the comparators to pass-through mode.
  • Skips enabling the serial receiver.
  • Skips enabling interrupts.
  • Switches on the maintenance LED.
  • Takes a short-cut in the main loop.
And, just for good measure, I added code for the PR=Q command to report when the last reset was due a WDT time-out (W).
Attachments
gateway-5.3.1.zip
Firmware that continues to run normally after a WDT reset.
(9.4 KiB) Downloaded 105 times
Schelte
Post Reply

Return to “Opentherm Gateway Forum”