Advice Application monitoring software?

SilverHood

FH is my second home
Joined
Dec 23, 2003
Messages
2,281
We had an outage at work on Monday that's causing my team endless frustration. A process managed by another team was down for 4 hours. My company won't spring for an expensive solution like Nagios, Dynatrace or ITRS Geneos to monitor applications. Does anyone know of any free or cheap solutions that we can use? All we really need is a tool with some basic processing monitoring and a graphical dashboard, so we can confirm server / processes we depend on are up and running.
 

Jupitus

Old and short, no wonder I'm grumpy!
Staff member
Moderator
FH Subscriber
Joined
Dec 14, 2003
Messages
3,285
We had an outage at work on Monday that's causing my team endless frustration. A process managed by another team was down for 4 hours. My company won't spring for an expensive solution like Nagios, Dynatrace or ITRS Geneos to monitor applications. Does anyone know of any free or cheap solutions that we can use? All we really need is a tool with some basic processing monitoring and a graphical dashboard, so we can confirm server / processes we depend on are up and running.

So the core issue is that it wasn't detected? We use ITRS ( I have it on my home pc right now for work) but a more basic solution should be out there, I would think.... maybe a small new outfit? Would you constitute a 'big name client' if you went with a company such as that?
 

SilverHood

FH is my second home
Joined
Dec 23, 2003
Messages
2,281
So the core issue is that it wasn't detected? We use ITRS ( I have it on my home pc right now for work) but a more basic solution should be out there, I would think.... maybe a small new outfit? Would you constitute a 'big name client' if you went with a company such as that?

Yeah, it wasn't detected. Our data exports were accepted, but went nowhere while this process was down. When we noticed something was wrong, we checked our entire data flow end to end, everything worked, but couldn't figure it out. Eventually the other team checked the export process, and it was down... they restarted it, and everything went out, many hours later.

Every team has their own cludge of monitoring hacks, but if someone asked me "is everything up?" ... I have no way of checking, short of spending an hour going through all our apps one by one. Sometimes the checks themselves cause outages due to support people locking service accounts while doing health checks.

I used ITRS at my previous jobs and it does exactly what I need, but it's been deemed too expensive. Our scope for monitoring would only be about 60 servers, maybe 20 if we cut it down to the essentials. We're reasonably well known within our market, but we're a small company (2000 employees), so I don't think anyone is going to be impressed if we use their software. I almost think people would be shocked that we didn't have any software for monitoring, I know I was when I joined.
 

Jupitus

Old and short, no wonder I'm grumpy!
Staff member
Moderator
FH Subscriber
Joined
Dec 14, 2003
Messages
3,285
Yeah, it wasn't detected. Our data exports were accepted, but went nowhere while this process was down. When we noticed something was wrong, we checked our entire data flow end to end, everything worked, but couldn't figure it out. Eventually the other team checked the export process, and it was down... they restarted it, and everything went out, many hours later.

Every team has their own cludge of monitoring hacks, but if someone asked me "is everything up?" ... I have no way of checking, short of spending an hour going through all our apps one by one. Sometimes the checks themselves cause outages due to support people locking service accounts while doing health checks.

I used ITRS at my previous jobs and it does exactly what I need, but it's been deemed too expensive. Our scope for monitoring would only be about 60 servers, maybe 20 if we cut it down to the essentials. We're reasonably well known within our market, but we're a small company (2000 employees), so I don't think anyone is going to be impressed if we use their software. I almost think people would be shocked that we didn't have any software for monitoring, I know I was when I joined.

Is it market data shizzle?
 

Raven

Happy Shopper Ray Mears
FH Subscriber
Joined
Dec 27, 2003
Messages
44,617
Oh, I had this issue for a week before nobody noticed it had all fallen over. Would love something like this!
 

Jupitus

Old and short, no wonder I'm grumpy!
Staff member
Moderator
FH Subscriber
Joined
Dec 14, 2003
Messages
3,285
No, it's for sending reports out, to clients, vendors and industry regulators. My RMDS market data shizzle problems were left behind a long time ago :)

RMDS ? I think you mean TREP! It's getting a rebrand again now, mind ;)
 

Jupitus

Old and short, no wonder I'm grumpy!
Staff member
Moderator
FH Subscriber
Joined
Dec 14, 2003
Messages
3,285
What's the new name? TREP Xtra? :D

Well... for now we're all 'Refinitiv' and Thomson need us to rebrand legaly since the separation.... but in a couple of months it will probably happen again with the LSEG deal... FYI I am a service manager for a very large hosted TREP setup in EMEA... it's kinda cool (well, better than being unemployed, which I was 6 years ago) ;)
 

SilverHood

FH is my second home
Joined
Dec 23, 2003
Messages
2,281
Well... for now we're all 'Refinitiv' and Thomson need us to rebrand legaly since the separation.... but in a couple of months it will probably happen again with the LSEG deal... FYI I am a service manager for a very large hosted TREP setup in EMEA... it's kinda cool (well, better than being unemployed, which I was 6 years ago) ;)

Can't they just retcon the name? The Refinitiv Enterprise Platform ?
 

Cadelin

Resident Freddy
Joined
Feb 18, 2004
Messages
2,514
My project uses Icinga (which is free) for exception handling monitoring and OpsGenie (which we pay for) for calling out on a subset of the those exceptions.

We also use Telgraf -> InfluxDB -> Grafana for time series monitoring and Elastic Search for analyszing job logs. We don't pay for them.

We have about 1600 physical servers providing the UK computing contribution to the CERN experiments. We are sceintists so we never have money for the fancy solutions.
 

Tay

Grumpy old fecker
Joined
Dec 23, 2003
Messages
1,310
We had an outage at work on Monday that's causing my team endless frustration. A process managed by another team was down for 4 hours. My company won't spring for an expensive solution like Nagios, Dynatrace or ITRS Geneos to monitor applications. Does anyone know of any free or cheap solutions that we can use? All we really need is a tool with some basic processing monitoring and a graphical dashboard, so we can confirm server / processes we depend on are up and running.

I'm pretty sure that Nagios was and still is free for smallish environments, if you need more than that then your company needs to work out the cost to fix the issue, cost of the outage, cost to fix, risk mitigation etc etc, Nagios looks a lot more appealing.

However, getting Nagios is just the start, support Nagios itself is no mean feat.
 

Jupitus

Old and short, no wonder I'm grumpy!
Staff member
Moderator
FH Subscriber
Joined
Dec 14, 2003
Messages
3,285
Can't they just retcon the name? The Refinitiv Enterprise Platform ?

They probably could, but it's all legal schmegal shite.... we just get told 'brandx software can't be used after xx/xx/xxxx' and then have to go figure how to achieve that across an estate of 500 + servers with no budget... it sucks tbpfh :confused: :bootyshake: (y)
 

Overdriven

Dumpster Fire of The South
Joined
Jan 23, 2004
Messages
12,630
ELK Stack or Prometheus.

Or just write your own monitoring services which ping endpoints. If they go down, spam emails.
 

old.Osy

No longer scrounging, still a bastard.
Joined
Dec 22, 2003
Messages
2,632
ELK Stack or Prometheus.

Or just write your own monitoring services which ping endpoints. If they go down, spam emails.

What he said regarding monitoring services. We had a windows application service and a scheduled task that would query for service status in the form of a vb script, if hanging would restart the service, if stopped would start the service, and shoot an email either way.

Edit: Just want to say that little bit of scripting was in place for almost 10 years, saving us countless of out of hours incidents and headaches.
 

SilverHood

FH is my second home
Joined
Dec 23, 2003
Messages
2,281
Thanks everyone for ideas. We have been writing our own scripts, and it works for us some of the time, but we want to move away from emails. If I have no emails alerting me of an outage, does that mean my systems are healthy? Or can I go to an app to check the health, and if we a green, then we're good? I'd prefer to be the later, while right now, we are the former.
 

Users who are viewing this thread

Top Bottom