Emonhub failing?

Hi,  My emoncms feeds/inputs fail randomly.  Sometimes it will run for 12+ hours, other times only 2 minutes.  I have a system with the data being sent back to a raspberry pi as serial data, and when emoncms fails I can still see the pi receiving from the sensors using minicom.

I set up the pi to send to both local and remote emoncms, but they both stopped displaying updates this morning.

This is an error I captured from the pi:

Traceback (most recent call last):
  File "/usr/share/emonhub/emonhub.py", line 336, in <module>
    hub.run()
  File "/usr/share/emonhub/emonhub.py", line 90, in run
    values = I.read()
  File "/home/pi/emonhub/src/emonhub_interfacer.py", line 368, in read
    return self._process_frame(f, t)
  File "/home/pi/emonhub/src/emonhub_interfacer.py", line 106, in _process_frame
    frame = self._decode_frame(ref, validated)
  File "/home/pi/emonhub/src/emonhub_interfacer.py", line 228, in _decode_frame
    decoded.insert(0, int(node))
ValueError: invalid literal for int() with base 10: '.48'

My data is in node 10 with 5 values (4 CT one VT)

Can I view the local database to see the invalid value?

I have seen the notify module as an add on for emailing if feeds are down,  can this be used to restart emonhub as well as email?

I also considered restarting thje pi once per day, but sometimes after a restart the feeds still won't work- so it takes two restarts.

Has anyone had a similar problem or could provide any advice?

 

Thanks,

David.

 

David46's picture

Re: Emonhub failing?

More testing using minicom I am pretty sure the emonhub service stops when minicom reports an incorrect line,

10 966.87 -534.54 0.29 -0.16 240.80   is the normal output, but when it crashes I had a line missing the node and the first 2 CT values

Originally I was using copper for the direct serial connection, but I have since replace that with a 433mhz wireless serial link.

Is it possible to make it discard this packet with missing data and continue?

 

Thanks

 

pb66's picture

Re: Emonhub failing?

Hi David.

It indeed looks like incomplete data is the root problem and finding the cause of that should be the main focus.

There is no question emonhub should handle this exception far more gracefully and I can take a look into that, But while emonhub can do some integrity checks like check if a number is a float or within a range etc it can't totally validate all data, yes in the case of ".48" that obviously isn't an int so that packet could be discarded, but how many times has "10.48" been passed as 10 without a problem when it was intended to be 210.48 or -4410.48 ? If the "0.48" was discarded gracefully (as it should be) unless you check the logs regularly you are oblivious to a problem until you spot anomalies the data you are collecting. 

Partial numbers are difficult to detect but partial frames can be a little easier. If you were receiving data in individual bytes and reconstructing to real numbers in emonhub, then a "datacodes" string could provide a fixed value count, but since you are receiving real values the "datacodes" decoding is not used so no checks are done. It may be handy if emonhub was able to use this to confirm the value count if demanded and I can look into that too, but this still wouldn't prevent partial numbers distorting the data.

How are you collecting data? what rf "system" are you using? the rf12's we use discard any incomplete packets so it's not something emonhub or other softwares need to worry too much about, hence the unnoticed "hole" :-)

Paul

pb66's picture

Re: Emonhub failing?

Thinking about the value count, there is a possibility emonhub may be able to check this already if you put an entry in the [nodes] section of the conf eg

[nodes]

        [[10]]

                 datacodes = 0, 0, 0, 0, 0

where each value is declared as a datatype "0". The "0" datatype was intended to tell emonhub not to decode real values and is the default datacode for all interfacers except the one for the RFM2Pi. Currently if you do not have any [nodes] data in the conf for node 10, emonhub will basically expect an unknown number of datatype "0" values, I haven't tried this, but in theory the string above will tell emonhub to "do nothing" for each of the 5 expected values and if there isn't 5 values to "do nothing" to, discard the frame. No guarantee it will work but there is a strong enough possibility it will work to justify trying it out.

if the frame is incomplete it should log a warning saying "RX data length: 4 is not valid for datacodes 0, 0, 0, 0, 0" for example if only 4 of the 5 expected are found.

A node id type check will be added when I get a mo to decide the best place for it and give it a test.

Paul

David46's picture

Re: Emonhub failing?

Hi Paul thanks for the reply,

 

I am using the Shield_CT1234_Voltage sketch with the output :

// Print to serial
    Serial.print(nodeID); Serial.print(' ');
    Serial.print(ct1.realPower); Serial.print(' ');
    Serial.print(ct2.realPower); Serial.print(' ');
    Serial.print(ct3.realPower); Serial.print(' ');
    Serial.print(ct4.realPower); Serial.print(' ');
    Serial.println(ct1.Vrms);

being sent to a HC-11 433Mhz Wireless to TTL CC1101 Module.  This is then received by another of the same modules feeding the Rx pin on the R pi.

In the emonhub.conf I have hashed out the RFM2Pi and added a serial direct interfacer.

Nodes is set as:

# List of nodes by node ID

# 'datacode' is default for node and 'datacodes' are per value data codes.

# if both are present 'datacode' is ignored in favour of 'datacodes'

[[99]]

datacode = h

datacodes = h, f, f, f, f, f

Are you saying that if I use the interfacer for the RFM2Pi then the packets would be processed better? Or is my hardware not suitable for that because it is effectively a dumb repeater. 

Does datacode = h provide the best error checking, whereas I am using = h, f, f, f, f, f which is causing the errors,  maybe the h could be a non numerical value to identify node and if it is not followed by 5 f values it discards all till a new h is received.  using a non numerical h would remove errors where an f value could be mistaken as the h to start the packet?

David.

edit:  sorry typed the above before reading your last post,  I have just changed the conf file to replace the datacodes f with 0,  will let you know the results.

 

 

 

pb66's picture

Re: Emonhub failing?

Hi David,

I'm not familiar with the rf devices you are using but I gather the sketch is just "printing" real human readable values which are then repeated via rf to the serial gpio, so I expect the "normal output" like that in bold in the 2nd post is exactly as you see in minicom. Therefore the values do not need decoding and are dealt with by emonhub's default "0" datatype. 

The hypothetical (out of range) node [[99]] in the conf was just an example template and changing the datacodes as you had will not of had any effect (unless you actually have a node 99 of course) I note you had used datacodes h, f, f, f, f, f to include the node id the datacodes are actually a template or key to decode just the data so the h wasn't required even if the f's had been correct for the data.

Up until now the node ids have only been numbers and more specifically usually between 1 and 31, and is not intended to process anything non-numeric (yet) and I think the "0" datatypes may provide a check for incomplete packets.

The correct interfacer to use is indeed the serial interfacer, the RFM2Pi 'Jee' interfacer is too tailored to the RFM2Pi to work with what you have.

Lets see how or if the "0" datacodes help and with a better node id test you should be good.

Paul 

 

David46's picture

Re: Emonhub failing?

Hi,

I worked for a while, but broke again.  Initially when I changed the conf file I restarted the Pi.  Maybe I did not wait long enough, but the local emoncms did not start updating feeds.  I tried to start it manually but ended up with a few errors- different things already started or failed to start?

 

Anyway eventually I got it all going for maybe 8 hours,  it seemed to handle "bad packets" without causing the emonhub service to stop.  When I woke this morning it had already failed, then after 2 hours it failed again, each time requiring restart of the service.

 

These are the last two error logs:

2014-12-22 15:44:50,079 WARNING 493 Discarded RX frame 'non-numerical content' : ['\xa6\xa6\x02j\x82r\x82\x92\x02\x92\xa2\x92r\x92\xc2j\n10', '1319.71', '-751.36', '0.41', '0.57', '242.27']
2014-12-22 15:45:31,029 WARNING 498 Discarded RX frame 'non-numerical content' : ['\xaa\x02\x82r\x92\x82\x02\x92\xa2\x9ar\x82\x9aj\n10', '1304.53', '-744.21', '-0.10', '-0.27', '241.87']

I do notice that I have alot more inputs now,  2 from each node 4, 5, 7 and 14.  and key 6, 7, and 8 from my normal node 10 have values 21 hours old?

 

I have left another putty session open to constantly watch the minicom events
 

 

edit:    failed again and the only irregularity I can see in minicom is

10 98.83 566.68 0.29 -0.02 238.40
10 101.39 563.63 0.08 -0.31 237.22
5 -0.03 239.12
10 100.87 548.33 -0.54 -0.07 238.97
10 94.80 537.05 0.08 0.74 235.65

That node 5 with 2 values

Restarted emonhub twice,  did not help so restarted pi-  all good again

 

 

 

pb66's picture

Re: Emonhub failing?

It's quite difficult to tell what's happening without the full picture, but reading between the lines a little here and there, I can see the issues definitely need tackling before emonhub.

The "last two error logs" tells us firstly that emonhub handled those bad packets by recognizing the "flaw" and discarding them with a error message, it also tells us that is not the cause as emonhub continued to function after the first to report the second. Then theres the content which is not "incomplete" or even just 2 messages amalgamated, they contain raw hex passed as a string too.

Old Inactive inputs on node 10 is a good sign as that suggests the "0" datacodes are doing their bit and only "5 value" node 10 frames are getting through.

If you are seeing "alot more inputs now" that may be a good thing by filtering more out, more is getting through. But it suggests how "random" the resulting data might be as only the node id irregularities are highlighted by creating an extra node. Something else to consider is emoncms discards any frames that have a node id outside the range 0-32 so how many other inputs would be created if that wasn't the case is not known.

The "node 5 with 2 values" is just half a packet and would of been passed to emoncms as normal, emonhub has no way of knowing it's not genuine. 

The fact that restarting emonhub doesn't clear the problem every time actually suggests something else is at fault. 

emonhub is designed to catch the occasional fault rather than pick out good packets from a unmanaged stream, Tackling this is maybe a good exercise for the development of emonhub as it is testing the error handling to an increased level, this can help emonhub become more robust, but from your point of view the more it discards and filters out, the less data you're left with and with a high probability of error in the remaining the data, it will at best be inconsistent and therefore unreliable and possibly misleading information. It would take far less effort and yield much better results to tackle the issue at source.

This probably isn't what you want to hear but I think you should reconnect using a direct wired connection and if all is well, then look at improving the wireless link either by building on what you have or switching to a known good rf link. The rfm2pi board is relatively cheap and a rfm12 or rfm69 will drop straight onto the shield.

Paul

 

 

pb66's picture

Re: Emonhub failing?

Just to clarify, I'm not suggesting emonhub is without fault. We have clearly already identified that if a non-numeric node id including a partial number eg ".2" (decimal point but no leading digit) will cause emonhub to fall over and that needs to be and will be addressed and that may well improve things a little for your data. But I'm very aware that emonhub is unlikely to be able to condition the current input stream to make it credible enough.

Paul

pb66's picture

Re: Emonhub failing?

So i started looking into the above "non-numeric node id" issue and have found it would be caught by the recent change made to catch empty frames. Similarly the empty frames shouldn't of reached emonhub but that doesn't mean emonhub should be vulnerable to such occurrences.(see issue for more info).

I have now merged those changes from the "testing" branch so if you update emonhub to rc1.2 using a git pull we can eliminate those errors causing emonhub to fail.

Paul 

David46's picture

Re: Emonhub failing?

Hi Paul,  thankyou for all your help so far.  I will update emonhub soon and see how much it improves, and will see if I can reconnect the serial link with copper.  That will be a bit involved since they are each installed in position. The raspberry pi will probably have to be moved closer to the switchboard but it will be good to remove the wireless link to help isolate my fault(s)

$ tail -f /var/log/emonhub/emonhub.log
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
    line = self.fp.readline()
  File "/usr/lib/python2.7/socket.py", line 447, in readline
    data = self._sock.recv(self._rbufsize)
timeout: timed out

2014-12-23 22:21:05,136 WARNING Send failure: wanted 'ok' but got
2014-12-23 22:35:59,212 WARNING Couldn't send to server, URLError: timed out
2014-12-23 22:35:59,214 WARNING Send failure: wanted 'ok' but got

This was my last log entry but it is all still working ok at 23:00

 

pb66's picture

Re: Emonhub failing?

Hi David, not sure why but those logs point to an issue at the emoncms end. the request is timing out before emonhub is getting the "ok" response it expects.

TrystanLea's picture

Re: Emonhub failing?

are you posting to emoncms.org? there is an intermittent load issue on emoncms.org at the moment that may cause the timeout, that would affect a couple of posts every now and then while receiving most of them correctly.

 

David46's picture

Re: Emonhub failing?

Hi,  yes I am posting to emoncms.org and to the local RPi.

Prior to today I had about 3 days of uninterrupted data.  Pretty sure this lined up wtih upgrading emonhub to version 1.2  Thankyou for your good work on that Paul.  My log file was still showing errors- unable to send to server, non numerical, string too short and I think a node id error.  But it kept running and processing what it could.

Todays fault was something in the switchboard but it is all running fine again.

 

2014-12-30 16:55:02,576 WARNING 1 Discarded RX frame 'non-numerical content' : ['\xc2\x11\x02\x13\x10\x05\x1a\x13\x13\x0c\x0e\n\x13\x1b\x01\t\x0e\x19\x1a\x13\x17\x15\x02\x13\x10\x15\x19\x10\x12\t\x06\x12\x02\x12\x18\x19\t\x0c\x19\x02\x12\x19\r\x02\x13\x17\x05\n\x13\x03\n\x1e10', '1640.67', '-1236.34', '-0.07', '0.16', '235.72']
 

The above error, similar to what we have seen from my system before does not seem to be created by my wireless link for a serial connected system.  A friend tried a setup with a different sketch modified for serial but was using a copper link for the transmit to the RPi and was getting similar garbage in his log.

 

Maybe some other direct serial users could provide feedback, although the newer emonhub does not seem troubled by it.

 

Also is Emoncms configurable so it logs my 24hour time period for accumulated data?  My daily totals seem to turn up maybe in line with your midnight?  Which means the totals do not match my inverter.

 

Thanks

pb66's picture

Re: Emonhub failing?

Good to hear you've made some progress. The log excerpt shown includes a string of hex control codes and as it was the first line read it maybe it is some set up data from the rf module perhaps. Do you see these anomalies in minicom too?

Paul

David46's picture

Re: Emonhub failing?

Hi,  I did not have an instance of minicom running at the time, but I have seen it a bit in the emonhub log file.  I thought maybe it was my garage door remote,  but I could not reproduce it.  Possibly something from the neighbours devices.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.