Yup. I already changed the binding code to round to a single place of decimals. Just in case.hvxl wrote:I thought you said the CS command had 8 decimal places of precision. That is not reflected by this log.
OTGW system error
Moderator: hvxl
Re: OTGW system error
Re: OTGW system error
OK. Then let's see what happens this way.
One interesting thing I noticed from your last posts is that you are sending the CS and CH commands back to back. The OTGW can only handle one command at a time. It relies on the delay introduced by the serial transmission of a character at 9600 baud to process the previous command. I don't expect that parsing the CS command will take 1ms, but this is something the excessive decimals may have an impact on. It's still not correct that the OTGW fails to handle that, but it's a possible scenario to keep in mind. It doesn't explain why you see the problem more frequently on firmware 5.3 than on 5.2.
One interesting thing I noticed from your last posts is that you are sending the CS and CH commands back to back. The OTGW can only handle one command at a time. It relies on the delay introduced by the serial transmission of a character at 9600 baud to process the previous command. I don't expect that parsing the CS command will take 1ms, but this is something the excessive decimals may have an impact on. It's still not correct that the OTGW fails to handle that, but it's a possible scenario to keep in mind. It doesn't explain why you see the problem more frequently on firmware 5.3 than on 5.2.
Schelte
Re: OTGW system error
Aha. Interesting. I think the situation may even be not just a "back to back" problem, but possibly even an "overlapping" problem!!hvxl wrote:sending the CS and CH commands back to back
A Java TCP socket connection has a write buffer and a read buffer, and in this OpenHAB binding there is one thread that writes to the one, and another that reads from the other; and so I think it is possible that two (or more) messages may be written before anything has been read back.
It is really hard to figure out all the timing combinations because there are so many places where buffering or overlapping could possibly occur: OpenHAB write thread => Java JVM client TCP write buffer => router/switch/cloud => USR-TCP232 server TCP read buffer => OTGW RS-232 port RX => OTGW processing => OTGW RS-232 port TX => USR-TCP232 server TCP write buffer => router/switch/cloud => Java JVM client TCP read buffer => OpenHAB read thread; it makes my head hurt
But this certainly started me thinking that I may need to change the pipelining model of the OpenHAB binding with specific locks so that no new command is sent until the prior command(s) have been ack'ed (or timed-out). I did not write the binding myself, so I may need to think deeply about this before introducing any changes though.
In the meantime, I have the current binding running with the logging mentioned above; so let's see what we may find..
Re: OTGW system error
Both commands are sent on the same socket. TCP will take care that they arrive in the order they were transmitted. The commands will then also be sent over the serial line in that order. So there can not be any overlapping. The OTGW will normally be able to process a command in the time it takes to transmit the first character of the new command at 9600 baud (about 1ms). So it should be ready in time to receive the second command. (This is one reason I chose this low baud rate).
As you will have seen, sending the two commands together usually works fine. But there can possibly be situations where the PIC is already doing something else when the first command arrives. As a result, it may start processing the first command after a delay. The CS command also takes more than average processing time. This may make the PIC late for receiving the second command, resulting in an overrun error. And maybe that's not handled correctly. I'm just speculating. It's difficult to simulate this situation, so it's not easy to verify.
As you will have seen, sending the two commands together usually works fine. But there can possibly be situations where the PIC is already doing something else when the first command arrives. As a result, it may start processing the first command after a delay. The CS command also takes more than average processing time. This may make the PIC late for receiving the second command, resulting in an overrun error. And maybe that's not handled correctly. I'm just speculating. It's difficult to simulate this situation, so it's not easy to verify.
Schelte
Re: OTGW system error
Ah. Many thanks for the helpful clarification.hvxl wrote:Both commands are sent on the same socket. TCP will take care that they arrive in the order they were transmitted.
Thinking about this further, if may be that turning on the logging will have inherently solved the problem. There are two parallel threads in the binding for reading and writing, but they are constrained through the logger's write access to the output file, so the parallel threads are now essentially being force serialized. So I am expecting that we won't find anything wrong now. But I will let it run for week or two to see if anything does appear.hvxl wrote:difficult to simulate this situation, so it's not easy to verify.
Re: OTGW system error
^
Just for info, I have reported this (potential) issue concerning the OpenHAB binding https://github.com/openhab/openhab-addons/issues/12457. And I am working on a solution in the binding code so that commands will be placed in a FIFO queue to ensure that second or subsequent commands are not transmitted to the OTGW until the binding has received the 'ack' message for the prior command. Or if the ack does not arrive, to attempt re-sending the command (until a maximum timeout).
Just for info, I have reported this (potential) issue concerning the OpenHAB binding https://github.com/openhab/openhab-addons/issues/12457. And I am working on a solution in the binding code so that commands will be placed in a FIFO queue to ensure that second or subsequent commands are not transmitted to the OTGW until the binding has received the 'ack' message for the prior command. Or if the ack does not arrive, to attempt re-sending the command (until a maximum timeout).
Re: OTGW system error
The OTGW should be able to handle it. And we don't even know yet if that actually is the problem.
Schelte
Re: OTGW system error
Below is an example of the logging with the new code with the FIFO command queue; you can see that now it does not send the CH command until it has the 'ack' for the CS command.
Code: Select all
2022-03-10 20:03:59.963 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00000000
2022-03-10 20:04:00.132 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000000
2022-03-10 20:04:00.988 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-10 20:04:01.140 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40194266
2022-03-10 20:04:02.011 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R10010000
2022-03-10 20:04:02.147 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BD0011400
2022-03-10 20:04:02.171 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CS=0.0
2022-03-10 20:04:02.198 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CS: 0.00
2022-03-10 20:04:02.202 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CH=0
2022-03-10 20:04:02.221 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CH: 0
2022-03-10 20:04:03.036 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-10 20:04:03.263 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A3E00
2022-03-10 20:04:04.075 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00110000
2022-03-10 20:04:04.274 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0110000
2022-03-10 20:04:05.094 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R001B0000
2022-03-10 20:04:05.279 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B601B0000
2022-03-10 20:04:06.118 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-10 20:04:06.288 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40740698
2022-03-10 20:04:07.142 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R900E6400
2022-03-10 20:04:07.294 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B500E6400
2022-03-10 20:04:08.166 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-10 20:04:08.302 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0781613
2022-03-10 20:04:09.196 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00000000
2022-03-10 20:04:09.400 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000000
2022-03-10 20:04:10.223 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-10 20:04:10.410 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40194266
2022-03-10 20:04:10.531 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CS=0.0
2022-03-10 20:04:10.557 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CS: 0.00
2022-03-10 20:04:10.558 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CH=0
2022-03-10 20:04:10.577 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CH: 0
2022-03-10 20:04:11.248 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R10010000
2022-03-10 20:04:11.416 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BD0011400
2022-03-10 20:04:12.271 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80380000
2022-03-10 20:04:12.424 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0383700
2022-03-10 20:04:13.296 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00390000
2022-03-10 20:04:13.431 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0394100
2022-03-10 20:04:14.324 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-10 20:04:14.546 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B401A3DE6
2022-03-10 20:04:15.353 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-10 20:04:15.556 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40740698
2022-03-10 20:04:16.375 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00000000
2022-03-10 20:04:16.559 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000000
2022-03-10 20:04:17.402 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
Re: OTGW system error
I would prefer to just get a log of the original failing case, without changing a whole bunch of things that may leave us in the dark about what the problem actually was.
I understand you simply want it to work. But I will not feel at ease, knowing there's a bug somewhere in the firmware that you worked around for now, but that may come back to bite someone at some point in the future.
I understand you simply want it to work. But I will not feel at ease, knowing there's a bug somewhere in the firmware that you worked around for now, but that may come back to bite someone at some point in the future.
Schelte
Re: OTGW system error
The OTGW has been running for ca. one week without error, and last night it went into lockout. I am sending CS and CH commands every 30 seconds which means the failure occurred after about 20'000 successful commands. Below is the log for about the last two minutes leading up to the failure. Interesting is the response at 17:29:20.012 just prior to the crash, which I think would normally be the response to a PR=A command, although the binding did NOT send any such PR command!!
Code: Select all
2022-03-17 17:27:50.247 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CS=50.0
2022-03-17 17:27:50.275 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CS: 50.00
2022-03-17 17:27:50.275 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CH=1
2022-03-17 17:27:50.295 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CH: 1
2022-03-17 17:27:50.798 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:27:51.026 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:27:51.816 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80380000
2022-03-17 17:27:52.030 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0383700
2022-03-17 17:27:52.837 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00390000
2022-03-17 17:27:53.040 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0394100
2022-03-17 17:27:53.867 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:27:54.049 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:27:54.889 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:27:55.040 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A66
2022-03-17 17:27:55.912 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:27:56.055 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000102
2022-03-17 17:27:56.937 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:27:57.272 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0192C66
2022-03-17 17:27:57.970 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:27:58.283 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:27:58.991 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:27:59.293 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40740893
2022-03-17 17:28:00.014 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00110000
2022-03-17 17:28:00.297 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0110000
2022-03-17 17:28:01.042 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R001B0000
2022-03-17 17:28:01.306 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B601B0000
2022-03-17 17:28:02.064 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:28:02.303 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:28:03.088 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R900E6400
2022-03-17 17:28:03.306 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B500E6400
2022-03-17 17:28:04.118 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:28:04.314 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A66
2022-03-17 17:28:05.138 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:28:05.323 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000102
2022-03-17 17:28:06.164 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:28:06.440 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40192C19
2022-03-17 17:28:07.182 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:28:07.449 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:28:08.207 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80380000
2022-03-17 17:28:08.458 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0383700
2022-03-17 17:28:09.232 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00390000
2022-03-17 17:28:09.466 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0394100
2022-03-17 17:28:10.247 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CS=50.0
2022-03-17 17:28:10.261 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:28:10.277 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CS: 50.00
2022-03-17 17:28:10.278 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CH=1
2022-03-17 17:28:10.297 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CH: 1
2022-03-17 17:28:10.474 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40740893
2022-03-17 17:28:11.282 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:28:11.483 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:28:12.307 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:28:12.492 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000102
2022-03-17 17:28:13.332 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:28:13.483 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0192BE6
2022-03-17 17:28:14.356 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:28:14.605 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:28:15.381 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:28:15.611 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A66
2022-03-17 17:28:16.401 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00110000
2022-03-17 17:28:16.621 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0110000
2022-03-17 17:28:17.427 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R001B0000
2022-03-17 17:28:17.628 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B601B0000
2022-03-17 17:28:18.452 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:28:18.636 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40740893
2022-03-17 17:28:19.479 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R900E6400
2022-03-17 17:28:19.646 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B500E6400
2022-03-17 17:28:20.502 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:28:20.654 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:28:21.526 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:28:21.758 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000102
2022-03-17 17:28:22.560 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:28:22.761 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0192BB3
2022-03-17 17:28:23.584 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:28:23.770 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:28:24.609 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80380000
2022-03-17 17:28:24.778 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0383700
2022-03-17 17:28:25.635 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00390000
2022-03-17 17:28:25.791 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0394100
2022-03-17 17:28:26.665 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:28:26.802 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A66
2022-03-17 17:28:27.686 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:28:27.897 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40740893
2022-03-17 17:28:28.725 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:28:28.907 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0000102
2022-03-17 17:28:29.752 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:28:29.916 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0192B80
2022-03-17 17:28:30.779 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:28:30.931 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:28:31.805 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:28:31.941 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:28:32.830 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00110000
2022-03-17 17:28:33.053 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0110000
2022-03-17 17:28:33.849 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R001B0000
2022-03-17 17:28:34.045 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B601B0000
2022-03-17 17:28:34.869 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:28:35.059 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A66
2022-03-17 17:28:35.894 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R900E6400
2022-03-17 17:28:36.069 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B500E6400
2022-03-17 17:28:36.928 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:28:37.080 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40740893
2022-03-17 17:28:37.946 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:28:38.085 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4000010A
2022-03-17 17:28:38.971 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:28:39.205 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0192D19
2022-03-17 17:28:39.991 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:28:40.209 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:28:41.016 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80380000
2022-03-17 17:28:41.221 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0383700
2022-03-17 17:28:42.041 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00390000
2022-03-17 17:28:42.208 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0394100
2022-03-17 17:28:43.064 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:28:43.217 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:28:44.089 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:28:44.226 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A66
2022-03-17 17:28:45.117 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:28:45.344 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4000010A
2022-03-17 17:28:46.135 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:28:46.357 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B401932CD
2022-03-17 17:28:47.162 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:28:47.362 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:28:48.186 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:28:48.371 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0740894
2022-03-17 17:28:49.216 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00110000
2022-03-17 17:28:49.383 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0110000
2022-03-17 17:28:50.241 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R001B0000
2022-03-17 17:28:50.243 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CS=50.0
2022-03-17 17:28:50.269 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CS: 50.00
2022-03-17 17:28:50.270 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CH=1
2022-03-17 17:28:50.289 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CH: 1
2022-03-17 17:28:50.393 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B601B0000
2022-03-17 17:28:51.260 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:28:51.490 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:28:52.294 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R900E6400
2022-03-17 17:28:52.500 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B500E6400
2022-03-17 17:28:53.320 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:28:53.506 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A33
2022-03-17 17:28:54.346 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:28:54.520 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4000010A
2022-03-17 17:28:55.371 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:28:55.524 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0193980
2022-03-17 17:28:56.397 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:28:56.626 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:28:57.432 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80380000
2022-03-17 17:28:57.634 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0383700
2022-03-17 17:28:58.457 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00390000
2022-03-17 17:28:58.642 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0394100
2022-03-17 17:28:59.483 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:28:59.652 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0740894
2022-03-17 17:29:00.508 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:29:00.661 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:29:01.533 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:29:01.673 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4000010A
2022-03-17 17:29:02.559 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:29:02.774 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B40193933
2022-03-17 17:29:03.594 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:29:03.779 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:29:04.621 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:29:04.788 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A66
2022-03-17 17:29:05.648 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00110000
2022-03-17 17:29:05.797 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0110000
2022-03-17 17:29:06.669 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R001B0000
2022-03-17 17:29:06.805 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B601B0000
2022-03-17 17:29:07.695 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:29:07.928 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0740894
2022-03-17 17:29:08.720 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R900E6400
2022-03-17 17:29:08.931 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B500E6400
2022-03-17 17:29:09.738 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00780000
2022-03-17 17:29:09.939 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4078162D
2022-03-17 17:29:10.244 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CS=50.0
2022-03-17 17:29:10.270 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CS: 50.00
2022-03-17 17:29:10.271 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Sent: CH=1
2022-03-17 17:29:10.291 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: CH: 1
2022-03-17 17:29:10.765 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:29:10.930 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4000010A
2022-03-17 17:29:11.792 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:29:11.945 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0193866
2022-03-17 17:29:12.811 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R90013200
2022-03-17 17:29:12.948 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B50013200
2022-03-17 17:29:13.836 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80380000
2022-03-17 17:29:14.064 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0383700
2022-03-17 17:29:14.855 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00390000
2022-03-17 17:29:15.074 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0394100
2022-03-17 17:29:15.881 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R801A0000
2022-03-17 17:29:16.085 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC01A2A33
2022-03-17 17:29:16.907 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R00740000
2022-03-17 17:29:17.097 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0740894
2022-03-17 17:29:17.934 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80000100
2022-03-17 17:29:18.101 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: B4000010A
2022-03-17 17:29:18.954 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: R80190000
2022-03-17 17:29:19.107 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: BC0193A19
2022-03-17 17:29:20.012 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: OpenTherm Gateway 5.3
2022-03-17 17:29:20.013 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - Received message: OpenTherm Gateway 5.3, (unknown)
2022-03-17 17:29:20.015 [TRACE] [rnal.OpenThermGatewaySocketConnector] - Read: WDT reset!
2022-03-17 17:29:20.015 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - Received message: WDT reset!, (unknown)
2022-03-17 17:29:40.027 [WARN ] [rnal.OpenThermGatewaySocketConnector] - Error communicating with OpenTherm Gateway: 'Read timed out'
2022-03-17 17:29:40.038 [DEBUG] [rnal.OpenThermGatewaySocketConnector] - OpenThermGatewaySocketConnector disconnected
Re: OTGW system error
The log shows that the issue is unrelated to the CS command. The WDT reset happens 10 seconds after the last CS command. The watchdog timer runs out after 16ms. So the changes you made were barking up the wrong tree.
In fact, the logs show nothing special going on at all before the WDT reset happens. It's just plodding along, doing exactly the same things that many other OTGWs also do. And nobody else has reported a problem. Do you happen to have anything configured on the GPIO pins? For example, reading a temperature sensor happens in the background. So a potential problem there would not show up in the logs. In stand-alone mode, a temperature sensor has no added value, so I would suggest to disable it.
You indicated that the problem happens more frequently on v5.3 than on v5.2. However, the only change between the two version is in the handling of serial commands. But there is no serial command being performed when your OTGW resets. So, I'm starting to wonder if perhaps on V5.2 you saw the problem that was fixed. And on v5.3 there is a bit somewhere in the code that is not fully binary. Most of the time it reads correctly. But sometimes it gives a 0 instead of a 1, or the other way around. That causes the program to derail, which is then caught by the watchdog. This is just speculation, but the best idea I can come up with at the moment. If it is what is happening, it should hopefully be resolved by reflashing the firmware once more.
The "OpenTherm Gateway 5.3" is the normal banner the OTGW sends after a reset. The "WDT reset!" message is reported after the reset happens. And also after the banner is printed. So that is normal in case of a WDT reset. (Although a WDT reset is not normal.) You can tell it's the startup banner because the response to a PR=A command has some additional characters at the start: "PR: A=OpenTherm Gateway 5.3".
In fact, the logs show nothing special going on at all before the WDT reset happens. It's just plodding along, doing exactly the same things that many other OTGWs also do. And nobody else has reported a problem. Do you happen to have anything configured on the GPIO pins? For example, reading a temperature sensor happens in the background. So a potential problem there would not show up in the logs. In stand-alone mode, a temperature sensor has no added value, so I would suggest to disable it.
You indicated that the problem happens more frequently on v5.3 than on v5.2. However, the only change between the two version is in the handling of serial commands. But there is no serial command being performed when your OTGW resets. So, I'm starting to wonder if perhaps on V5.2 you saw the problem that was fixed. And on v5.3 there is a bit somewhere in the code that is not fully binary. Most of the time it reads correctly. But sometimes it gives a 0 instead of a 1, or the other way around. That causes the program to derail, which is then caught by the watchdog. This is just speculation, but the best idea I can come up with at the moment. If it is what is happening, it should hopefully be resolved by reflashing the firmware once more.
The "OpenTherm Gateway 5.3" is the normal banner the OTGW sends after a reset. The "WDT reset!" message is reported after the reset happens. And also after the banner is printed. So that is normal in case of a WDT reset. (Although a WDT reset is not normal.) You can tell it's the startup banner because the response to a PR=A command has some additional characters at the start: "PR: A=OpenTherm Gateway 5.3".
Schelte
Re: OTGW system error
> anything configured on the GPIO pins?
I have GPIO ports A & B mapped to LED E & F respectively, which are flashing on Transmitting & Receiving an OpenTherm Message respectively. Maybe that is too much GPIO / LED activity? Perhaps I should connect the GPIO's to 'none' instead ??
> indicated that the problem happens more frequently on v5.3 than on v5.2
I think that was a chance observation. On v5.2 I got one failure in two weeks. The on v5.3 I got three failures in one week. But with the logging enabled on v5.3 I got one failure in one week. So the difference is marginal.
> hopefully be resolved by reflashing the firmware
That sounds highly unlikely, but I can try it. I will let you know.
You also suggested an alternate 'solution' that might still work -- namely connecting 'Boiler Maintenance' to GPIO A or B, and running a jumper wire to the RST pin. Is that really likely to work? I sometimes observe a fast blink on the LEDs when power cycling the OTGW, so I fear that that might cause a perpetual spiral of self RST...
I have GPIO ports A & B mapped to LED E & F respectively, which are flashing on Transmitting & Receiving an OpenTherm Message respectively. Maybe that is too much GPIO / LED activity? Perhaps I should connect the GPIO's to 'none' instead ??
> indicated that the problem happens more frequently on v5.3 than on v5.2
I think that was a chance observation. On v5.2 I got one failure in two weeks. The on v5.3 I got three failures in one week. But with the logging enabled on v5.3 I got one failure in one week. So the difference is marginal.
> hopefully be resolved by reflashing the firmware
That sounds highly unlikely, but I can try it. I will let you know.
You also suggested an alternate 'solution' that might still work -- namely connecting 'Boiler Maintenance' to GPIO A or B, and running a jumper wire to the RST pin. Is that really likely to work? I sometimes observe a fast blink on the LEDs when power cycling the OTGW, so I fear that that might cause a perpetual spiral of self RST...
Re: OTGW system error
Having LEDs configured on the GPIO pins should not be a problem. They don't need complicated background processing.
I agree that an unstable bit is highly unlikely. But I can't think of any more likely reason, given the point at which it fails and the amount of other people that use the same functionality without problems.
After a reset, only LEDs A, B, C, and D flash. That's why I suggested to use one of the GPIO ports. But I haven't tested it. It may be easier to modify the firmware to not go into bare minimum mode after a WDT reset. Just report it and resume normal operation.
I agree that an unstable bit is highly unlikely. But I can't think of any more likely reason, given the point at which it fails and the amount of other people that use the same functionality without problems.
After a reset, only LEDs A, B, C, and D flash. That's why I suggested to use one of the GPIO ports. But I haven't tested it. It may be easier to modify the firmware to not go into bare minimum mode after a WDT reset. Just report it and resume normal operation.
Schelte
Re: OTGW system error
> may be easier to modify the firmware
Crikey! The last time I wrote anything in assembler was for the 6502 literally decades ago..
Crikey! The last time I wrote anything in assembler was for the 6502 literally decades ago..
Re: OTGW system error
In this case it's not so much writing assembly code, but deleting it. To create the attached firmware, I deleted the parts where, after a WDT reset, the code:
- Configures the comparators to pass-through mode.
- Skips enabling the serial receiver.
- Skips enabling interrupts.
- Switches on the maintenance LED.
- Takes a short-cut in the main loop.
- Attachments
-
- gateway-5.3.1.zip
- Firmware that continues to run normally after a WDT reset.
- (9.4 KiB) Downloaded 106 times
Schelte