Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

im running the December beta emonpi image and last night (around 9am uk time) all my RF received channels stopped updating but my MQTT ones kept working suggesting that most of the system was still alive. The RF module was still blinking its red led suggesting that the radio but was still alive too and a reboot fixed everything.

i recently updated the RFM to the latest version.

Is this a known problem? It's the first time I've seen it.

Paul Reed's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

I'm currently experiencing the same issue - see this thread.

Paul

pb66's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

The fact that the RFM2Pi's LED continued to blink would suggest the issue is more likely to be on the Pi rather than the RFM2Pi. The best place to look should this reoccur is in emonhub.log. I suspect the RFM2Pi thread may have died but we cannot confirm that as after a reboot, the read-only OS of the emonPi image clears all the logs and any useful info with it.

@Paul - are you still having trouble? I hoped no news was good news as there were no further reports.

Paul

Paul Reed's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

I've just updated my thread Paul, but yes, this does seem different as my LED stops flashing.

Paul

sheppy's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

@PB66, I'm guessing I need to change the logging as currenly /log/emonhub/emonhub.log just contains

2016-01-29 21:44:18,981 INFO     MainThread EmonHub emonHub 'emon-pi' variant v1.1
2016-01-29 21:44:18,983 INFO     MainThread Opening hub...

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

I also get this issue. 9.3 installation following the raspi installation guide on a PiB on Jessie Lite with USB HDD.

In my case, the rfm12 led is flashing at the 5 second interval but if I look at the emoncms Input View I see that the inputs haven't been updated for a long time. Last time I hadn't checked things for a while and all the inputs were 31 hours since last update (note to self - must set up a notification when this occurs!)

emonhub log shows some errors over the previous hours mainly to do with emoncms send failures, e.g.

2016-01-30 02:49:44,762 WARNING emonCMS send failure: wanted 'ok' but got ''
2016-01-30 03:01:34,797 WARNING emonCMS couldn't send to server, Exception: Traceback (most recent call last):
  File "/home/pi/emonhub/src/emonhub_reporter.py", line 227, in _send_post
    response = urllib2.urlopen(request, timeout=60)   etc. etc. etc.

emonhub status gives:

pi@Homepi-emoncms:~ $ sudo service emonhub status
● emonhub.service - LSB: Start/stop emonHub
   Loaded: loaded (/etc/init.d/emonhub)
   Active: active (running) since Fri 2016-01-29 16:59:01 GMT; 1 day 18h ago
  Process: 331 ExecStart=/etc/init.d/emonhub start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/emonhub.service
           └─378 python /usr/share/emonhub/emonhub.py --logfile /var/log/emonhub/emonhub.log --...

Jan 29 16:58:58 Homepi-emoncms emonhub[331]: Starting OpenEnergyMonitor emonHub: emonhub has...ok.
Jan 29 16:59:01 Homepi-emoncms systemd[1]: Started LSB: Start/stop emonHub.
Jan 31 03:12:06 Homepi-emoncms emonhub[331]: Traceback (most recent call last):
Jan 31 03:12:06 Homepi-emoncms emonhub[331]: File "/usr/share/emonhub/emonhub.py", line 338,...le>
Jan 31 03:12:06 Homepi-emoncms emonhub[331]: hub.run()
Jan 31 03:12:06 Homepi-emoncms emonhub[331]: File "/usr/share/emonhub/emonhub.py", line 90, in run
Jan 31 03:12:06 Homepi-emoncms emonhub[331]: values = I.read()
Jan 31 03:12:06 Homepi-emoncms emonhub[331]: File "/home/pi/emonhub/src/emonhub_interfacer.p...ead
Jan 31 03:12:06 Homepi-emoncms emonhub[331]: self._sock_rx_buf = self._sock_rx_buf + conn.re...24)
Jan 31 03:12:06 Homepi-emoncms emonhub[331]: socket.error: [Errno 104] Connection reset by peer
Hint: Some lines were ellipsized, use -l to show in full.

Is there anything else I can check to see what emonhub is doing?

sudo service emonhub stop, then start brings things back to life, inputs start updating every 5 seconds or so.

Simon

 

pb66's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

@Sheppy - yes, I would set a loglevel=DEBUG just to be sure I captured as much info as possible, the loglevel=WARNING may tell you "it's broke" but not give you any insight as to why it broke, in these instances the more info the better usually. 

@ Simon - I don't think this is the same issue either because the "break" in your route is between emonhub and emoncms, this is unlikely to be able to differentiate between mqtt and rfm, this thread is about rfm stopping whilst mqtt continues so the issue is most likely between emonhub and the rfm2pi.

However the same applies about the loglevel and the lead up to the fault in your log. the 2 emonhub messages you posted tell us emonhub didn't get the expected response from emoncms and what that response was. It does not tell us if what it sent was likely to get a good response.

Plus sheppy is using the emon-pi variant on the 22dec image not genuine emonhub v1.2 on Jessie-lite. I am pretty confident that your issue will be with (reaching) emoncms not emonhub. possibly another dns/dhcp issue with "lite".

If you are also still having "rfm only" issues as well then we need to revisit your existing RFM2Pi thread.

Paul

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

Hi paul, you're right I shouldn't have hijacked this thread. My issues with rfm have gone away and now I only have this type of failure.

Although I think I need some more info from you, When you say 'your route is between emonhub and emoncms, this is unlikely to be able to differentiate between mqtt and rfm' and 'the 2 emonhub messages you posted tell us emonhub didn't get the expected response from emoncms and what that response was. It does not tell us if what it sent was likely to get a good response.' Can you be a bit more specific. I'd be glad to start up a new thread on these types of issues.

Simon

PS you keep accusing me of having dhcp and dns issues, which i'm sure I haven't got at all   accusing is perhaps the wrong word but it'll have to do  :-).  There was a dns issue on wget which I tracked down to my ISP virgin media with the help of the pi forums but that wouldn't affect this.

sheppy's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

@pb66 It hasn't failed so far so nothing to report. I'll try and establish a failure pattern and then turn the logging on when a failure is due. Its been up for almost 48 hours so far. Are there any services I can restart when it does fail to narrow it down a bit? 

pb66's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

@ sheppy - I'd be inclined to just change the loglevel now, you do not need to restart and you can always change it back further down the line if no longer required. The main benefit of the log is to gather data unattended and continuously so that when the issue occurs unexpectedly, data has already been captured that may help establish the cause or pattern.

At the moment we have zero info so I couldn't say what would be a valid test, the logs from when/if it fails will hopefully give us some direction, even if they contain no error codes the continued logging of certain messages may confirm or eliminate certain possibilities.

@Simon - I only suggest a possible dns/dhcp issue when it fits the symptoms, like your example when "There was a dns issue on wget" or when your emonhub warnings and exceptions clearly stated "URLError: [Errno -2] Name or service not known".

If there is a problem with your set up that is preventing emonhub communicating with an emoncms instance on the same machine via http, then the first thing that sprung to my mind was a dns/dchp type issue, a thought I shared with you in hope it may help not offend, my use of the term "another" only casually refers to it not being the first time you will have encountered an issue of this type IF indeed it is of this type, not that is necessarily related.

A new thread maybe a good idea and was the reason for pointing out the difference in issues the first of those 2 comments. Since the messages state emonhub did not get a valid response from emoncms, it is highly unlikely to be specific any one type of input and not others (ie rfm not mqtt as the op reports) and I also commented that emoncms's non-response might actually be due to a bad request from emonhub. I'm not suggesting that is (or isn't) the case, it probably (but not definitely) isn't, but I was only pointing out the information to make a quick and very useful observation isn't present in the short piece of log provided, which was inline with what I was saying to Sheppy about the loglevel at that point.

Paul

 

sheppy's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

@pb66 Debug Logging turned on - presumably it will trim it before the tmpfs /var/log runs out of space?

Also is it worth turning on quiet mode? (how?) There's a lot of discarded packets in there

pb66's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

emonhub self manages it's logfiles, rotating them at 5mb although if using the emonpi image I believe there may be log rotation at 1mb. Either way, no, emonhub cannot consume the whole /var/log tmpfs partition..

Tough call to make on quiet mode. Yes, less packets may be less difficult to sift through, but if your loss of data is due to packets no longer being seen as valid, or a potential serial issue, the "noise" may be useful. Flip a coin, or go with your instinct if there really are that many. It can be enabled/disabled by setting "quiet = true" or "quiet = false" in the runtimesettings of the RFM2Pi in emonhub.conf. By default, quiet mode is active (true) if there is no entry.

Paul

sheppy's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

I'll leave it as is, especially as details on this issue are lacking.

 

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

@Paul, thanks for the (as usual) clear explanation.

Having said that I'm not sure that my issue is different to Sheppy's.

The system stopped passing through data again soon after the last time. I had changed the emonhub log setting to DEBUG in the meantime. You'll see the log below.

Essentially everything was running fine, no warnings about not being able to reach emoncms etc. when suddenly the log stops. You'll see the restart a few hours later. I've highlighted in BOLD the point where I did the service emonhub stop and start to get things functioning again. I haven't done a reboot on the system for a few weeks, just stopped and started emonhub if there has been an issue.

If this occurs again in the future is there any arcane bit of Linux I can use to get an idea of what has happened. You mention the rfm2pi thread in one of your earlier replies. Can I see if that is still running? Or are there any other logs I can look at?

Simon

2016-02-01 03:41:44,253 DEBUG 7585 NEW FRAME : 1454298104.25  10 46 1 0 0 0 0 0 0 189 94 178 1 122 2 199 1 153 2 114 1
2016-02-01 03:41:44,270 DEBUG 7585 Timestamp : 1454298104.25
2016-02-01 03:41:44,273 DEBUG 7585      Node : 10
2016-02-01 03:41:44,275 DEBUG 7585    Values : [302, 0, 0, 0, 24253, 434, 634, 455, 665, 370]
2016-02-01 03:41:44,373 DEBUG 7585 Append to 'emonCMSlocal' buffer => time: 1454298104.25, data: [10, 302, 0, 0, 0, 24253, 434, 634, 455, 665, 370], ref: 7585
2016-02-01 03:41:44,477 INFO emonCMSlocal sending: http://localhost/emoncms/input/bulk.json?apikey=E-M-O-N-C-M-S-A-P-I-K-E-Y&data=[[1454298104.25,10,302,0,0,0,24253,434,634,455,665,370]]&sentat=1454298104
2016-02-01 03:41:44,667 DEBUG emonCMSlocal acknowledged receipt with 'ok' from http://localhost/emoncms
2016-02-01 03:41:45,567 DEBUG emonCMS acknowledged receipt with 'ok' from http://emoncms.org
2016-02-01 03:41:45,580 DEBUG 7585 Append to 'emonCMS' buffer => time: 1454298104.25, data: [10, 302, 0, 0, 0, 24253, 434, 634, 455, 665, 370], ref: 7585
2016-02-01 03:41:45,684 INFO emonCMS sending: http://emoncms.org/input/bulk.json?apikey=E-M-O-N-C-M-S-A-P-I-K-E-Y&data=[[1454298104.25,10,302,0,0,0,24253,434,634,455,665,370]]&sentat=1454298105
2016-02-01 03:41:45,766 DEBUG emonCMS acknowledged receipt with 'ok' from http://emoncms.org
2016-02-01 08:08:19,343 DEBUG SIGINT received.
2016-02-01 08:09:09,476 INFO EmonHub Pre-Release Development Version (rc1.2)
2016-02-01 08:09:09,479 INFO Opening hub...
2016-02-01 08:09:09,492 INFO Logging level set to DEBUG
2016-02-01 08:09:09,495 INFO Creating EmonHubEmoncmsReporter 'emonCMS' 
2016-02-01 08:09:09,512 INFO Set up reporter 'emonCMS' (buffer: memory | size: 1000)
2016-02-01 08:09:09,523 INFO Setting emonCMS url: http://emoncms.org
2016-02-01 08:09:09,525 INFO Setting emonCMS apikey: set
2016-02-01 08:09:09,528 INFO Creating EmonHubEmoncmsReporter 'emonCMSlocal' 
2016-02-01 08:09:09,544 INFO Set up reporter 'emonCMSlocal' (buffer: memory | size: 1000)
2016-02-01 08:09:09,563 INFO Setting emonCMSlocal url: http://localhost/emoncms
2016-02-01 08:09:09,565 INFO Setting emonCMSlocal apikey: set
2016-02-01 08:09:09,568 INFO Creating EmonHubJeeInterfacer 'RFM2Pi' 
2016-02-01 08:09:09,574 DEBUG Opening serial port: /dev/ttyAMA0 @ 9600 bits/s
2016-02-01 08:09:11,580 INFO RFM2Pi device firmware version & configuration: not available
2016-02-01 08:09:11,583 INFO Setting RFM2Pi frequency: 868 (8b)
2016-02-01 08:09:12,587 INFO Setting RFM2Pi group: 210 (210g)
2016-02-01 08:09:13,590 INFO Setting RFM2Pi quiet: 1 (1q)
2016-02-01 08:09:14,594 INFO Setting RFM2Pi baseid: 15 (15i)
2016-02-01 08:09:15,598 INFO Creating EmonHubSocketInterfacer 'Socket' 
2016-02-01 08:09:15,601 DEBUG Opening socket on port 20100
2016-02-01 08:09:15,648 DEBUG RFM2Pi acknowledged command: > 8b
2016-02-01 08:09:16,154 DEBUG RFM2Pi acknowledged command: > 210g
2016-02-01 08:09:16,634 DEBUG RFM2Pi acknowledged command: > 1q
2016-02-01 08:09:16,900 DEBUG RFM2Pi acknowledged command: > 15i
2016-02-01 08:09:17,407 DEBUG 1 NEW FRAME : 1454314157.41  10 244 3 0 0 0 0 0 0 204 94 69 1 81 1 68 1 176 2 19 2
2016-02-01 08:09:17,424 DEBUG 1 Timestamp : 1454314157.41
2016-02-01 08:09:17,426 DEBUG 1      Node : 10
2016-02-01 08:09:17,429 DEBUG 1    Values : [1012, 0, 0, 0, 24268, 325, 337, 324, 688, 531]
2016-02-01 08:09:17,504 DEBUG 1 Append to 'emonCMS' buffer => time: 1454314157.41, data: [10, 1012, 0, 0, 0, 24268, 325, 337, 324, 688, 531], ref: 1
2016-02-01 08:09:17,539 DEBUG 1 Append to 'emonCMSlocal' buffer => time: 1454314157.41, data: [10, 1012, 0, 0, 0, 24268, 325, 337, 324, 688, 531], ref: 1
2016-02-01 08:09:17,607 INFO emonCMS sending: http://emoncms.org/input/bulk.json?apikey=E-M-O-N-C-M-S-A-P-I-K-E-Y&data=[[1454314157.41,10,1012,0,0,0,24268,325,337,324,688,531]]&sentat=1454314157
2016-02-01 08:09:17,643 INFO emonCMSlocal sending: http://localhost/emoncms/input/bulk.json?apikey=E-M-O-N-C-M-S-A-P-I-K-E-Y&data=[[1454314157.41,10,1012,0,0,0,24268,325,337,324,688,531]]&sentat=1454314157
2016-02-01 08:09:17,852 DEBUG emonCMSlocal acknowledged receipt with 'ok' from http://localhost/emoncms
2016-02-01 08:09:18,116 DEBUG emonCMS acknowledged receipt with 'ok' from http://emoncms.org

 

pb66's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

I need to think about this some more, I can see no reason why if it is able to log the SIGINT why it isn't logging a graceful shutdown of emonhub. I need to find out some more info on Jessie and possibly why the SIGINT is being used over SIGTERM, and if that is a problem. I have no clue as to why this has suddenly changed, there has been no change to emonhub v1.2. Clutching at straws a bit here, but maybe the serial port not being gracefully closed is causing a serial communication error when trying to re-use the open but orphaned port and on this occasion it has eventually been closed by the OS prior to trying a restart on emonhub some 4+hrs later. I do not even know if the OS can do that, I'm just putting the clues together.

if it occurs again can you also do a "df -h" to check there is still some available disk space just in case it's choking?

Also can you set an "interval = 600" in the RFM2Pi runtimesettings, it will transmit the time for a glcd every 10mins, doesn't matter if you have one or not, it will cause emonhub to try and do something every 10mins even if theres no data coming in. I'm not suggesting this is the case but the log finishes quite tidily with an "ok" from emoncms as if it was patiently awaiting data from the rfm2pi for over 4hrs, the attempts to send the time would be proof of life and also the response to the attempts to tx the time will give insight to whether the rfm2pi is on-line.

Paul 

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

Will do Paul. Will be good to get to the bottom of this.

One thing I did look at but when things were running again was to use top to see if anything is hogging the system. I think you had also seen some latency with SSH on Jessie. I'm not sure if I am reading the output correctly but the emonhub/emoncms bits seem to be taking less than 15% of CPU. So plenty of bandwidth there.

I'll report back if and when things go awry again.

Simon

sheppy's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

@PB66 in case it affects your thinking my issue happened with EMONCMS locally and @Bramco appears to use emoncms.org so I'd guess that can be ruled out as a cause. I've just switched to a newly built image so it will be a few days before I can offer anything meaningful.

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

@sheppy.  I use both. Local and .org.

Simon

sheppy's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

@Simon. Presumably they both stop updating?

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

@Sheppy - yes it affects both of them.

Here's the output from the df -h

pi@Homepi-emoncms:~ $ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       5.1G  1.3G  3.6G  26% /
devtmpfs         87M     0   87M   0% /dev
tmpfs            91M     0   91M   0% /dev/shm
tmpfs            91M  4.5M   87M   5% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs            91M     0   91M   0% /sys/fs/cgroup
/dev/mmcblk0p1   60M   20M   41M  34% /boot

Although I'm not sure I understand this as the system has been transferred to the HDD, so the sizes don't seem right.

Also attached is the emonhub log.

Take a look at 14:30 yesterday where I stopped and restarted the hub to add the interval = 600. This shows a normal stop and restart (unlike the one above, or the one at 08:22 today). Also from the log you can see that emonCMSLocal gets it's data in a big chunk every 10 minutes, which is I guess what you'd expect from the interval = 600 setting.

Everything seems to run normally like this, sending to emonCMS (.org) on every packet and to emonCMSLocal every 10 minutes until you get to 19:19 yesterday evening when the emonCMS sends stop and the new frame reporting stops. But sending to emonCMSLocal in 10 minute batches continues at 19:22 and then every 10 minutes until everything finally grinds to a stop at 20:23. (There does seem to be a disjoin between the last timestamp to emonCMS and the first time stamp in the next batch to emonCMSLocal.)

Anyway, over to you Paul, any ideas?

Simon

Paul Reed's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

Although I'm not sure I understand this as the system has been transferred to the HDD, so the sizes don't seem right.

Have you expanded your filesystem to fill the HDD?
GParted is a good graphical tool to use, or it can be done using resize2fs via the command terminal.

Paul

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

Hadn't done that Paul, I was under the impression the script to switch from SD to HDD did that for you?

I'll use GParted next time I have to restart. Thanks for the tip.

Simon

PS If the script doesn't do the expand, maybe this tip should be added to the instructions for switching from SD to HDD.

Paul Reed's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

The script does expand the filesystem, but in the HDD guide it stresses that it applies to systems installed via the Raspberry Pi installation guide, which is either Wheezy or Jessie, and I think that you are using Lite?
The script has not been tested using Lite, so probably doesn't work fully, as it relies upon other software being available in the OS, which maybe has been stripped out in Lite.

Paul

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

I'll give it a whirl later today when I'm home, I have the image saved so can quickly transfer it to an SD. I can't remember seeing any errors when I did it last time but if there are I'll post it in a separate issue.

Simon

pb66's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

@Simon - not quite sure how you ended up putting the "interval = 600" in the "emonCMSlocal" runtimesettings rather than the "RFM2Pi" runtimesettings but I believe it has proved the same point just not as cleanly or clearly.

Also you can set any runtimesettings without restarting emonhub, when you look at the log the "interval = 600" setting is acknowledged 15 secs before you restart emonhub. to see this in action open 2 ssh windows at once and run the command " tail -f /var/log/emonhub/emonhub.log in one and then in the other open emonhub.conf and edit some runtimesettings, each time you save the conf file the settings are picked up and applied to emonhub whilst running. 

The addition of a "interval = 600" in the RFM2Pi runtimesettings as originally suggested may still shed a little more light if you wish to try that.

You could also remove the "interval = 600" from the "emonCMSlocal" runtimesettings, I recommend doing that as a separate "save" so the first action is easier to spot (this latter change will cause a rush in traffic to the local server as the restriction is lifted more on this below)

At 19:19:07 the last packet is processed from the rfm2pi, why that is the last packet I do not know at this point, perhaps an issue with the device, the serial port or indeed emonhub itself, however emonhub does continue to function as a backlog of packets has built up in the emonCMSLocal buffer due to the 10min posting rate set by the "interval = 600" and they continue to be delivered in batches of 100 every 10mins. However that continued delivery is in it's own child thread and the main parent thread may have stopped or is being blocked, I have never seen it occur and would not normally consider it an option but the lack of graceful shut down messages suggests emonhub has stopped, maybe.

Although at this point I do not think is is related and may even be detracting from the issue, but there are numerous "dns/dhcp type" error messages of more than one type, dotted throughout the log for both the local and remote emoncms connections. I know that's not something you want to hear but it is the case none the less.

I have attached your log with some notes throughout, it wasn't done in a way to publish it was just me adding markers whilst trying to decypher the info. If you use find or search in your text reader to search/find "????" all my comments start "????".

There are some "tracebacks" that look like a failure but they are have been caught and handled correctly, the traceback is provided in th e log for debugging only.

Originally I said this was unlikely to be the same issue as Sheppy's based on the few facts provided of which one was the "emonCMS timeout" error message you provided, I think ""those"" types of error message are unrelated to this issue and for the time being more likely to be just a characteristic of your setup.

What is the socket interfacer? the "sudo service emonhub status" log you provided seems to mention reading sockets but although I see one created in your log it doesn't seem to be used (in this log) is it worth disabling it for the moment if it's not being used? if so just add a "#" (hash) in front of the "Type = EmonHubSocketInterfacer" line in emonhub.conf, this has the effect of hiding the configuration from emonhub as every interfacer must have a Type and any that don't are ignored.

Please provide a new log of the error with the new settings and also the "sudo service emonhub status" mini log might be useful.

Paul

 

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

Paul,

Thanks for the clear explanations and apologies for putting the interval = 600 in the wrong place. Schoolboy error. I will rectify that and set things off again. I'll also comment out the socket interfacer, this is for my esp8266 Heat Bank controller which is reporting elsewhere until I get this thing going properly.

I can only see dns/dhcp type errors for the .org sends, all of only this type:

2016-02-03 09:02:16,066 WARNING emonCMS couldn't send to server, URLError: [Errno -2] Name or service not known

Am I missing something? We're with Virgin Media and their email was down yesterday so probably other things too which may have been causing these errors. And as you say are probably a DNS lookup problem and aren't detrimental to the whole as the next send works fine, so it's just a glitch which the code gets over.

Anyway, let's see if we get a log of a fail with the interval in the right place. Thanks again for taking the time to look at this.

Simon

PS Not sure if you've seen the jeelib weblog recently but there's a very interesting development from Jean-Claude Wippler around his Jet/Hub concept. You may want to take a look before embarking on the next version of emonhub.

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

@Paul Reed.

The USB HDD script worked fine on another system (Pi2) with a Jessie Lite image, so I'm not sure what happened on the PiB I'm running things on.

Next time I need to restart the system I'll check it out on the PiB although I can't see how the system should have any impact on the script.

Simon

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

Failed again Paul...

I'm going to try swapping the power supplies on the powered usb hub and the Pi to make sure it's not anything as daft as that. Can't see why it should cause any problems but you never know. I can also try running it just on the SD card.

pi@Homepi-emoncms:~ $ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       5.1G  1.3G  3.6G  26% /
devtmpfs         87M     0   87M   0% /dev
tmpfs            91M     0   91M   0% /dev/shm
tmpfs            91M  4.5M   87M   5% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs            91M     0   91M   0% /sys/fs/cgroup
/dev/mmcblk0p1   60M   20M   41M  34% /boot
pi@Homepi-emoncms:~ $ sudo service emonhub status
● emonhub.service - LSB: Start/stop emonHub
   Loaded: loaded (/etc/init.d/emonhub)
   Active: active (running) since Fri 2016-02-05 09:35:28 GMT; 22h ago
  Process: 6156 ExecStop=/etc/init.d/emonhub stop (code=exited, status=0/SUCCESS)
  Process: 6197 ExecStart=/etc/init.d/emonhub start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/emonhub.service
           └─6209 python /usr/share/emonhub/emonhub.py --logfile /var/log/emonhub/emonhub.log -...

Feb 05 09:35:28 Homepi-emoncms emonhub[6197]: Starting OpenEnergyMonitor emonHub: emonhub ha...ok.
Feb 05 09:35:28 Homepi-emoncms systemd[1]: Started LSB: Start/stop emonHub.
Hint: Some lines were ellipsized, use -l to show in full.

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

And again - from strange to weird.

Emonhub has managed to change my rfm12pi to 433 from 868Mhz which of course stops it receiving anything.

You can see in the attached log how first of all it stops receiving while this time the interval 600 keeps on working but then some spurious commands seem to go out to the rfm12pi which changes it's frequency.

df and status as below. (I did a sudo shutdown -r now after the first failure this morning.)

Simon

pi@Homepi-emoncms:~ $ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       5.1G  1.3G  3.6G  26% /
devtmpfs         87M     0   87M   0% /dev
tmpfs            91M     0   91M   0% /dev/shm
tmpfs            91M  4.5M   87M   5% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs            91M     0   91M   0% /sys/fs/cgroup
/dev/mmcblk0p1   60M   20M   41M  34% /boot
pi@Homepi-emoncms:~ $ sudo service emonhub status
● emonhub.service - LSB: Start/stop emonHub
   Loaded: loaded (/etc/init.d/emonhub)
   Active: active (running) since Sat 2016-02-06 08:06:16 GMT; 3h 18min ago
  Process: 330 ExecStart=/etc/init.d/emonhub start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/emonhub.service
           └─372 python /usr/share/emonhub/emonhub.py --logfile /var/log/emonhub/emonhub.log --...

Feb 06 08:06:13 Homepi-emoncms emonhub[330]: Starting OpenEnergyMonitor emonHub: emonhub has...ok.
Feb 06 08:06:16 Homepi-emoncms systemd[1]: Started LSB: Start/stop emonHub.
Hint: Some lines were ellipsized, use -l to show in full.

pb66's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

"Emonhub has managed to change my rfm12pi to 433 from 868Mhz" 

Where are you seeing this? do you have another log as your post suggests this has happened more than oncw this morning.

I see that 7:15 all still seemed well and another "normal" response was recieved by broadcasting the time

2016-02-06 07:15:50,897 DEBUG RFM2Pi broadcasting time: 07:15
2016-02-06 07:15:51,119 DEBUG RFM2Pi acknowledged command: > 0s
2016-02-06 07:15:51,349 DEBUG RFM2Pi confirmed sent packet size:  -> 4 b

The "0s" command to send (with no request for an ack) the preceding csv (ie 0,7,15,0,) to node 0 (ie broadcast to all nodes) is acknowledged by the "> 0s" returned from the rfm2pi and the packet size was also confirmed in the following line with "-> 4 b" as 4 bytes correct for "0,7,15,0"

emonHub then continues to receive packets for the next 10mins until the last packet at 07:25:46, 6secs later it broadcasts the time at it's 10min interval (600s) all appears normally up to this point, But the response from the rfm2pi is very odd indeed. it acknowledges a command of "0b" 3 times, these responses are the only "bad" thing I can see besides the lack of received packets beyond this point. There are another 2 successfully sent and confirmed (both command and size) time broadcasts at 7:35 and 7:45. which suggests despite the lack of received packets, both the rfm2pi and emonhub "continue" to function.

I use the term continue but I cannot rule out the rfm2pi is restarting at some point. If there were a crash recovery and the eeprom stored config, freq, baseid and group were not available it would probally revert any unavailable settings to 433b, 15i or 210g as these are hard coded into the firmware. If we refer back to the "rfm2pi goes hard down" thread, only those users not sticking to the default 15i, 210g and 433b that get "changes" occuring and as I pointed out there, it is more likely the same thing is occuring to all users but it's undetectable on set-ups that stick to the defaults.

If the group, node or frequency were being changed and that was the cause of the original complaint then this could be proved by simply revising the settings in emonhub.conf, eg is you suspect the freq is wrong then change it to something other than what it should be, save the conf and then change it to what it should be again. emonhub will only send the "new" setting if there is a detected change in the conf file, so it must change to the correct setting to make emonhub reapply it.

Going back to what we see in emonhubs log the time is sent from emonhub via a hardcoded command that just has the hour and minutes values changed according to time so it is unlikely for that string to morph into something else to result in the 3 "0b" commands. 

Even if the commas of the csv could be mistaken as "b"s by the rfm2pi there would be a 1 min window at midnight where all the values were "0" and even then there would probably be either 4 "0b"s or the 3 "0b"s would be followed by a broadcast transmission of "0" bytes.

Plus you are seeing an issue that was there prior to the "send time" code being activated so if this is a valid fault caused by the time sending then it is another issue on top of what you were already seeing.

Looking at the way the "b" command is processed in the current firmware, the "0b" commands would not result in any changes anyway. the "if(value)" would fail as the value is "0". 

If emonhub was to change the frequency setting I would expect lines like this (taken from your previous log)

2016-02-02 14:30:41,985 INFO Setting RFM2Pi frequency: 868 (8b)
2016-02-02 14:30:46,062 DEBUG RFM2Pi acknowledged command: > 8b

Right now I suspect the rfm2pi is at fault, although the evidence is far from conclusive, I was previously confident about emonHubs performance and less confident of any rfm2pi firmware, the clues we see do seem to fit that idea more than they cause me to question it.

The other log may show something different, or future logs may shed some light perhaps.

Would you consider installing a later firmware to the rfm69pi that you are not currently using???

We could write some extra logging into your emonhub to see exactly what is sent to the serial port at the time of the "send time" if we see this fail coincide with a send again but I do not feel that is the cause as it wasn't originally active.

Or add a line to emonhub to log everything received from the rfm2pi rather than discard anything not relevant to emonhub, but I feel these efforts would do very little even if they do reveal anything (and I'm not too sure they would) as we cannot change anything in the rfm2pi without uploading new firmware.

The systend mini log and the df -h do not show anything odd, in fact the systemd log seems to confirm there was no perceived issues during either the 22hrs or the 3hrs 18mins emonhub was running.

Sorry, I wish I could put my finger on the exact cause but currently the only smoking gun I see is the one used to put another hole in the messenger :-)

Paul

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

Hi Paul,

Apologies for the slow response. Life tries to continue even though my emoncms sysadmin work wants to take over.  :-)

Anyway, where to start, I guess where you left off....  Smoking gun, I see no smoking guns around here...  Sounds like the wide mouthed frog joke...

More seriously though, the system crashed again yesterday. I've attached the log again and leaving the interval timer running begins to shw some patterns.

Take a look at these extracts:

2016-02-07 22:48:24,049 DEBUG RFM2Pi broadcasting time: 22:48
2016-02-07 22:48:24,275 DEBUG RFM2Pi acknowledged command: > 22b

 

2016-02-07 22:58:27,321 DEBUG RFM2Pi acknowledged command: > 0b
2016-02-07 22:58:27,762 DEBUG RFM2Pi acknowledged command: > 58b

Fortunately the rfm12 has at some stage been switched to 433, so the log isn't cluttered with received buffers as it's long since given up the ghost on that front.

In the log, the 1st failure happens around 9:59 yesterday with the 0bs and 0ss and after that I did a sudo shutdown -r.

At 13:07 it goes again and you can see the rfm12 responses start to match parts of the time and packets stop, so somehow the rfm12 has reverted to 433MHz

Then you get this sequence, which shows the rfm12 acking an 8b command and obviously reverting to 868MHz as packets start to arrive 5 seconds later.

2016-02-07 18:08:21,477 DEBUG RFM2Pi acknowledged command: > 8b
2016-02-07 18:08:25,014 DEBUG 709 NEW FRAME : 1454868505.01  10 197 8 0 0 0 0 0 0 231 93 210 1 132 2 210 1 166 2 119 1

And then 10 minutes later it goes again, although this time we see a 1b command which will change the band although it's not a valid b command.

2016-02-07 18:18:10,175 DEBUG emonCMS acknowledged receipt with 'ok' from http://emoncms.org
2016-02-07 18:18:21,282 DEBUG RFM2Pi broadcasting time: 18:18
2016-02-07 18:18:21,504 DEBUG RFM2Pi acknowledged command: > 1b
2016-02-07 18:18:21,946 DEBUG RFM2Pi acknowledged command: > 0b
2016-02-07 18:28:21,355 DEBUG RFM2Pi broadcasting time: 18:28
2016-02-07 18:28:21,577 DEBUG RFM2Pi acknowledged command: > 0s
2016-02-07 18:28:21,799 DEBUG RFM2Pi confirmed sent packet size:  -> 4 b

So all told I'm not sure I'm any clearer, except in this log we've seen some reasons as to why the rfm12 changes band.

Following on from what you said in your earlier analysis, I guess it could be that the rfm12 is restarting but somehow with the various b commands being received, I'm not sure about this. Doesn't the rfm12 store the settings in EEPROM? And if so wouldn't it simply restart in 868.

Or is all of this just some flaky connection or power supply. I did swap the power supply a few days ago when I moved the system onto my work bench, maybe I'll try the cable and a different usb hub to see if it is down to the those types of issues.

Smoking guns, anyone seen any smoking guns?   :-)

Simon

 

 

 

 

pb66's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

Ok this log has far more interesting bits. I can see some of the time updates do result in incorrect acknowledgements, where it could be tripping up isn't really clear, is emonhub sending the correct packet? is the packet arriving intact at the rfm2pi? or is the rfm2pi misinterpreting the packet?

The time sends themselves cannot be the root of the issue, as previously pointed out your issue pre-dates the activation of the time sending code and the time send packet is hardcoded so that only the hours and minutes values are changed, the csv commas and the trailing s command are fixed in the write to serial port code.

There are many instances that do not make any sense, to point where the responses may just be garbage without an obvious pattern. I have studied alot of emonhub.logs for many reasons as you can imagine and I have never seen "rogue acknowledgements", given I wasn't looking for them I cannot say they do not exist but I would have certainly expected to have noted some along the way.

I can see from the help-text and current configuration text blocks that the frequency has been changed to 433, but at no time was there a logged "4b" command nor an acknowledgement to that effect, there are many, many "0b"s and some "1b"s and although they may not show the command being sent the commands are acknowledged, so why not the "4b" even the "8b"s are always acknowledged.

There are instances where the time is sent and the minutes OR the hours get morphed into a b command, but never both, why do some get morphed into 4 x "0b"s ? ie what happened to BOTH the time values.

There is an instance where "0i" is set, this is an issue that was fixed in the last commit to the oem rfm2pi firmware, that causes a "failed configuration" and will not recover untill a valid node id is set (1 to 30), or a reboot/crash forces the use of eeprom values.

You comment above about the settings being stored in the eeprom is correct, IF that is functioning correctly that is what is supposed to happen, as you'll see in the other thread there is a question over the rfm2pi forgetting it's config and reverting to the hardcoded values, highlighted by Glyn using a version that did not store it's config to the eeprom.

The help text for your device clearly states <nnn>g will set the group but looking closer at your logs the 210g command never works, nor does the "q" but that was absent from some firmwares.

Looking back at your first "600" log there is an instant where at 8:23 3rd feb the group and quiet commands are sent

2016-02-03 08:23:22,242 INFO Setting RFM2Pi group: 210 (210g)
2016-02-03 08:23:23,246 INFO Setting RFM2Pi quiet: 1 (1q) 

and they are somehow morphed into one

2016-02-03 08:23:29,763 DEBUG RFM2Pi acknowledged command: > 211q

​This is when the interval = 600 was in the reporter so there was no time sending, but the way emonhub processes settings in a loop it uses a fixed pattern to write to the serial port with only the command in the brackets inserted and sent each loop before it reads the next setting. At the other end in the rfm2pi firmware each char is processed individually and any "command character" can trigger the processing of a command using the previous value(s). eg 8b is a "b" command and the last value was 8 so "8b" it is.

This leads me to believe the rfm2pi had taken the leading "21" of the "210g" and not seen or ignored the rest until the "1q" arrived making the command "211q", whether that is a serial issue or a firmware issue or an hardware issue I couldn't say based on what we have.

There also seems to be alot of coincidence suggesting reuse of variables perhaps, why do the time sends result mainly in "b"s and occasionally an "i"? why any other command character eg "a"?

I do not see enough of a pattern to determine the cause. Unless you can monitor the serial traffic somehow you need to change one end or the other to move forward, v1.2 emonhub is being used reliably on many systems and is less likely to be the cause based on the little evidence we have. Your rfm2pi is running an unknown out of date firmware that may (or may not) be the cause.

You have a spare device and the quickest and most sensible test you can do right now is to load the firmware that Paul Reed is testing for a similar issue, see "Feeds stop updating" and try that before anything else. the data we are gathering seems too random to be of use and if it does turn out to be a problem with a firmware that should be out of service then we have all wasted out time.

I am not saying it is definitely the rfm2pi firmware, nor am I saying the firmware Paul is testing will fix it. But there is a long standing issue with a number of rfm2pi's that predates your setup and emonhub so it would seem the best place to start is there unless there is evidence to suggest something different to tackle.

It could possibly be power related I guess, but since the Pi is more power hungry and less "single minded" than the AVR I would of expected trouble to manifest in other ways too, but I suppose it also depends on how robust the hardware is to low voltage? It may also be a "Jessie" or "Jessie-lite" issue as these are still relatively new ventures and may require "tweeks" to the way we have been doing things in wheezy, including emonhub. And yes it could be an emonhub issue too which I would be very keen to resolve if it turns out to be the case.

Paul

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

Hi Paul,

Yes I can bet you've seen your share of emonhub logs in your time!

I think we can rule out power supply issues as I swapped out both supplies (Pi and USB hub) and it's still failing.

I've been watching the other thread with interest and will try updating the firmware as outlined there. I'll let you know if there is any progress. Might try it on the spare rfm69 as you suggested earlier. 

Simon

pb66's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

Yes if you are tempted to try the firmware update then I would recommend the rfm69pi, it's not in service so you can swap it in and then out again if it's worse, leaving your existing rfm12 rfm2pi as it is until we are a little more sure of the situation means you have little to lose by trying the later firmware. 

The hex file attached to that thread is compiled for a rfm69 so it will not run on your rfm12 device, I can recompile for a rfm12 no problem but it would be better at this stage if all testing was done with a common hardware/firmware combo.   

Paul

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

So more than a week later and with the rfm69 in place and updated to the firmware Paul provided in this thread everything is running fine. So I think my issues must have been down to a failing rfm12 (hardware).

Thought I'd give this thread a jog as I noticed Paul (Reed) had had another failure.

For the record my system is a PiB with Jessie Lite running on a USB HDD and an rfm69 running the firmware above.

Simon

PS - I've also rebooted my router a couple of times in the last week and Jessie Lite hasn't had any problems in reconnecting (Edimax dongle).

PPS Thanks to Paul as usual for the patience and taking the trouble to make the latest firmware for the rfm.

pb66's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

This is good news Simon.

Regarding your other rfm2pi, it may well be a firmware issue not a hardware issue, can you confirm it is definitely a v2 not a v1 (I'm sure we've done this but I want to be sure) ie it is a small surface mount mcu not a big 28 pin DIL ?

I can recompile the firmware for you to use on a rfm12b RFM2Pi v2 very easily, I have only refrained from doing so only to avoid confusion since we are only currently focusing on debugging the rfm69, perhaps we can try that down the line once the firmware is a little more tried and (time)tested, so don't give up on or bin the rfm12 device just yet.

Paul

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

Hi Paul, was beginning to worry about your health! A week away, must have been vacation :-)

As for the rfm12 it's a V2.5.3 with the surface mount. I'd be glad to test it again with revised firmware although of course the system is running like a train at the mo' (fingers crossed of course) so maybe it would be worth waiting to see if it fails again. And as you imply fixing the 69 versions is much more important as the 12s are not being supplied any more.

Simon

pb66's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

"vacation" no such luck! Just very, very busy with a work project at the moment, lots of very long days, my sanity is as much a concern as my health right now :-)  But thanks for your thoughts :-)

I didn't intend to imply rfm69 was more important, just that there are more active "69" complainers ie testers and not complicating the issue with 2 different rfm modules seemed the wiser option, lets stick with "69" now for simplicity and then if/when fixed we can roll out to rfm12, these are perhaps the devices that will most need the updated firmware as they are further behind.

I have an almost finished script that will be incorporated into emonhub to "watch" a firmware repo (local or remote) and if the "version" increases it will update the rfm2pi automatically, it does not require the use of the "avrdude-rpi" mods it all done in-house by emonhub. I would hope to roll this out to all emonhub + rfm2pi users regardless of vintage of device to activate the code the user would just need to define the device ie 12 or 69 in emonhub.conf. 

This paves the way for a closer emonhub to rfm2pi integration as I am also working on "fault codes" as a shorthand way for the rfm2pi to communicate it's status or events to emonhub which will in turn react or at least log them in user friendly long hand messages. 

Paul

Bramco's picture

Re: Emonpi image, RFM69PI images froze, rest of Emonhub MQTT inputs still ok

Sounds like a good plan Paul. The more these things can be automated and provide sensible long hand messages the better. Although of course all that added functionality also needs to be idiot proofed - don't go there...

In the long term I'm planning to move away from rfm and onto wifi based systems (esp8266). My Heat Bank and CH controller (esp8266) now monitors all the temperature sensors, so the emonTx is only managing the PV diversion and reporting the power. As soon as I have that set up to manage the boiler properly with weather compensation etc. via node-red and MQTT, I'll be trying to build a replacement for the emonTx based on an esp8266 using an external ADC chip to start with. 

Simon

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.