Server crashes solution?

Tilda

Moderator
Moderator
Joined
Dec 22, 2003
Messages
5,755
Ok, from the info I have, a server crash goes something like this:

A zone crashes (ie midgard frontier)
All those people in the zone try to log back in
The Authentication server gets overloaded and crashes.
The auth server then has to reboot or something.

So, say that the auth server can normally cope with 400 people trying to log in at the same time. When it goes over this ( ie when theres a mass LD) it crashes.
Why not write a little script that artificially limits the Auth server to accepting 250 people? Then when it recieves too many connect's, it has a "buffer" of 150 people to over run into.

An example (assuming the overload point for the auth server is 400 and there is no artficial cap enforced.):

00:01 Zone crash
00:05 First people from booted zone try to log back in again
00:06 In the next 4 seconds 500 people try to log back in
00:10
00:11 Auth server dies


An example (assuming the overload point for the auth server is 400 and there is an artficial cap of 250 enforced.):

00:01 Zone crash
00:05 First people from booted zone try to log back in again
00:06 In the next 4 seconds 500 people try to log back in
00:08 Auth server hits cap of 250
00:10 A few odd connections get through so there are 261 people trying to log in. The server allows the extra 11 people to over-run into the buffer. However no further connections are permitted. As these 261 people log in, more people are filtered through (numbers still limited to ensure auth server dosn't die)

The result of this would be although people would perhaps take 5-10 secs longer to log on, the auth server would stay up and there would be no big problem.

I don't know if this is what happens, I dont know what happens exactly when a zone crashes.

What are peoples views on this? Would it work?

Tilda
 

Arcee

Loyal Freddie
Joined
Jan 9, 2004
Messages
127
Good idea, but i dont think GOA would even bother listening. They really need to sort the problem out. They are making alot of money out of Daoc customers and tbh, the game service is awful. Alot of people i know are moving to the US servers due to the Euro problems. I play PVP, so it doesnt bother me much, but when my brother comes into my room saying 'excal down again' it kicks me in the balls.
 

Kami

Can't get enough of FH
Joined
Dec 22, 2003
Messages
2,254
The game service is fine, seriously think some people are over reacting as usual. Let them go to the US servers (which have the same problems btw) we'll be better off without them.

Tilda I've no doubt that they're already looking onto a solution, maybe you should send them your idea incase they've missed it though. Got to admit I doubt they'll want to limit the logins like that all the time since it'd just mean a slower service and no doubt people would start moaning about it :(
 

Tilda

Moderator
Moderator
Joined
Dec 22, 2003
Messages
5,755
Yeah Kami, i've emailed it to a few people.
I dont think that the log in would be significantly slower.
Look at it this way:
Without the limit auth server crashes and takes several hours to fix, with the limit it might take you 5 mins to get back in game rather than 30 seconds.

Tilda
 

Fana

Fledgling Freddie
Joined
Dec 23, 2003
Messages
2,181
Good to see a creative suggestion instead of the usuall "GOA sucks thread". Lets hope something like this gets implemented, sounds like a good fix until the root of the problem - the zonecrashes - can be dealt with.
 

Arindra

Fledgling Freddie
Joined
Jan 5, 2004
Messages
163
You can't cap the number of people trying to connect to the server at once.

You can only cap the number of users who the server doesn't choose to ignore.

I would suspect it's most likely the authentication server problems come from too many incoming requests at once - rather than how many logons it processes simultaeneously.

If this is the case the auth server is effectively failing under a DDoS attack.

Remember you do not establish a connection as such, all that happens is that everyone throws packets at each other, and the auth server declares a 'connection' to exist when it recognises a new user and starts tracking a conversation that the server has with the new client. This cannot prevent 'unconnected' users continuing to throw more packets in amoungst the 'connected' ones.

It's standard practice to limit the number of open processes in all parts of any system, simply to reduce risk to the system from bugs or malicious attacks, never mind the probelm of too many users. I'd be shocked if the auth server doesn't already have the limit you describe tilda, but all it can do is limit how many packet conversations it will act on, it cannot limit how many packets it recieves.

That said, the client could use a few 'connection failed, retrying in 20 seconds, or press esc to exit' processes put in.
 

Loxleyhood

Fledgling Freddie
Joined
Dec 22, 2003
Messages
2,228
If only mids hadn't of stooped to alarm clock, none of this would be happening.
 

Fana

Fledgling Freddie
Joined
Dec 23, 2003
Messages
2,181
Loxleyhood said:
If only mids hadn't of stooped to alarm clock, none of this would be happening.

Well, wouldnt the same thing happen if Mids tried a regular raid on Albs? Only difference would be that we would crash albland instead. Or is it just the Mid frontier server(s) that are suffering under heavy load?
 

Loxleyhood

Fledgling Freddie
Joined
Dec 22, 2003
Messages
2,228
Not really. Because if mid did a prime time raid well they should be in and out quick. Thus the amount of Albs in the zone would be less and also probably the amount of Mids in the zone would be less.

In the case of the recent fights Albion has come to Midgard in open war, not really relic raids, where there are collossal amounts of people from both sides since there is no element of surprise, and also Hibernia got involved.
 

Blondy

Fledgling Freddie
Joined
Dec 24, 2003
Messages
294
Aye nice idea, GOA did say they have upgraded there hardware to ''top notch'' and there wont be many more server crashes, hopefully there wont, maybe they will bring a 3rd server in, maybe not ;)
 

Tilda

Moderator
Moderator
Joined
Dec 22, 2003
Messages
5,755
ok, the way i imagine it.
I see a Box ( the auth server) with a pipe going into it.
Through the pipe flows water, it seems to me that using a smaller "pipe" would reduce the ammount of water (join requests) reaching the server so its harder to overload.
 

Svartmetall

Great Unclean One
Joined
Jan 5, 2004
Messages
2,467
I pay GOA for a service.

They are not providing that service.

And yet I never see German or French servers, with higher populations, crashing every day like this. I do NOT believe GOA's claim of "top-notch" hardware, it doesn't fit the facts.
 

Fana

Fledgling Freddie
Joined
Dec 23, 2003
Messages
2,181
I dont know what to believe regarding the 'top-notch' statement, but as Loxely said we have a bit of an unusuall situation at the moment with the huge numbers of Albs and Mids waring in Mid frontiers every evening.
It is however true that we pay for the ability to wage war on this precise scale since it is an advertised feature of the game.
Personaly i enjoy rvr on this scale - as long as its playable - since it lessens the need for those "perfect" fotm groups (people seem much more accepting of odd group setups under a call to arms :) ). Id hate to think we cant have any more of it :(

Btw, from your sig i take it you have a thing for trolls Svartmetall ^^
 

Kami

Can't get enough of FH
Joined
Dec 22, 2003
Messages
2,254
Svart if you don't believe GOA perhaps you should go play on the US servers (which have the same problems as here) and bitch at Mythic instead ;)
 

Svartmetall

Great Unclean One
Joined
Jan 5, 2004
Messages
2,467
Kami, maybe GOA should just provide the service we're all paying for?

You may notice that Avalon, Stonehenge, Broceliande, Lyonesse and Ys - which all have higher regular populations than Excalibur - do NOT crash like this. And we're supposed to believe that they all have the same "top-notch" hardware? Sorry, I don't buy it.
 

Svartmetall

Great Unclean One
Joined
Jan 5, 2004
Messages
2,467
Fana said:
Btw, from your sig i take it you have a thing for trolls Svartmetall ^^

heh, I just love the big lummoxes...they personify Midgard for me.
 

Kami

Can't get enough of FH
Joined
Dec 22, 2003
Messages
2,254
Svartmetall said:
Kami, maybe GOA should just provide the service we're all paying for?

You may notice that Avalon, Stonehenge, Broceliande, Lyonesse and Ys - which all have higher regular populations than Excalibur - do NOT crash like this. And we're supposed to believe that they all have the same "top-notch" hardware? Sorry, I don't buy it.
They are providing the service but they can hardly do it when the hardware available today simply won't handle a gigantic zerg that happens so regularly on Excalibur. NO GAME today could handle that, you're living in a dream world. At the end of the day if the game is pissing you off so much or you're so disappointed with the service you're getting /quit. A game isn't work getting so worked up about :)

I still think a lot of the problems are client based, where GOA/Mythic have no control. I know people who try to zerg with 56K modems, not very clever and surely it has a negative effect on the lag caused in those situations. I'm not saying that it's thier fault the server crashes but if the clients can't handle the volume of information they're getting, it can't help the servers much. I've always been impressed that DAOC works so well, I remember sitting at Brit bank on Ultima Online doing sod all with a 2D client and people were lag ghosting everywhere with only about 40 people on my screen - DAOC is a huge step forward.

DAOC runs fine until the realms build up huge zergs and run them in the same zone, we're not talking about 100+ on each side here we're talking about several hundred v several hundred. That's a sodding huge amount of data to shift :( Remember how GOA were when the game first came out after Beta? they've vastly better now, blaming them for the limitations of server hardware isn't particularly fair when there's probably no server out there that could cope with the demands being placed on them.
 

Darzil

Fledgling Freddie
Joined
Jan 10, 2004
Messages
2,651
Quote:
Originally Posted by Svartmetall

You may notice that Avalon, Stonehenge, Broceliande, Lyonesse and Ys - which all have higher regular populations than Excalibur - do NOT crash like this. And we're supposed to believe that they all have the same "top-notch" hardware? Sorry, I don't buy it.


Do we know that they don't crash ? I've not been reading the French or German news, which is where such things will be mentioned.

Darzil
 

Arindra

Fledgling Freddie
Joined
Jan 5, 2004
Messages
163
Tilda said:
ok, the way i imagine it.
I see a Box ( the auth server) with a pipe going into it.
Through the pipe flows water, it seems to me that using a smaller "pipe" would reduce the ammount of water (join requests) reaching the server so its harder to overload.

This is a good analogy.

*BUT*

The failure modes we need to consider are....

1) The pipe is too small to contain the incoming requests, and as a result it bursts, so that now no more requests get through. (Incoming network failure)

2) The reduced cross section of the pipe increases the pressure within the pipe, causing the server to break due to the pressure on it. (Server is overloaded with packets, and can't find the ones it considers 'connected' as a result).

3) Water/requests stack up in the pipe, until they go stale (individual packets expire) and thereby poison the server (make certain messages in conversations the server is following incomplete/corrupt).
 

the_hermit

Fledgling Freddie
Joined
Dec 23, 2003
Messages
195
Svartmetall said:
heh, I just love the big lummoxes...they personify Midgard for me.

Love them, just don't LOVE them.... ;)


But regardless how it is described - we either need a new server or a new cistern :D LIke was said previously, the other servers with higher populations never sit on the page with "*down* 0 users".

And did I read elsewhere that Prydwyn just had a full scale war in their RvR Lands, sans crashing?
 

Teknoid

Fledgling Freddie
Joined
Jan 6, 2004
Messages
66
TBH, this is just getting stupid, the server is crashing atleast once every day now.
I would guess if they were taken over by microsoft, it was be classed as a "feature". but I am paying for two accounts each month, and i am not getting to spend half the time on the server then what i used too. my gaming time like most players as active in the evenings, and that is always disrupted, IMHO, the usual "1 free day for all the hassles" just isnt enough, the lack of admin skills being issued to Exc server is ruining night after night of gaming time. not to mention each reset tends to rollback 5 minutes of play time, which when in DF, 5 minutes could be the difference between a dead HL or a repop of mobs on your back.

to top it off, there is no warning to the players still on the Exc login, that the server is about to be rebooted. so we dont have a chance to organise our raids and quit in a safe spot.

I am guessing that GOA must be running their machines on Dell computers, cause the hardware must be really crap quality. it scares me to think that they can consider upgrading the game, adding new features, heavier graphics, and try to pull back even higher player count, when they can not even provide a stable server.
Horizons, SWG, do not suffer from this poor performance. I believe horizons have a timed reboot of the server at the most unactive time of day (3-4am in morning) shame GOA can not think of such genious ideas.

Teknoid :sex:
 

Tilda

Moderator
Moderator
Joined
Dec 22, 2003
Messages
5,755
Lets keep this on topic guys, this isn't a general whine thread, its a suggestions thread for solutions.
 

Ziva

Fluffy
Joined
Dec 22, 2003
Messages
651
I don't think such a script is possible. Afaik the only thing you can limit is usersessions and if i am correct GOA has server groups meaning each server has his own job and max user count (in this case it would also mean that if, let's say 1000 users would be on the same server this would also crash).

I think the only thing that maybe would solve the problem is to make the servers redundant. Meaning multiple servers will perform the same job sharing a database that's using a clustered setup. That would spread ppl performing the same task like logging in and beeing in the same zone.

Furthermore i have to add that GOA has less problems keeping Daoc up then i have keeping my own pc up :rolleyes: I personally don't have so much problems with their (imo low amount of) downtime but maybe it's just because i have never seen a pc or server with 100% uptime (remembers a certain amount of pretty important dns servers going down a while ago).
 

Rhirap

Fledgling Freddie
Joined
Jan 3, 2004
Messages
286
or, as they say, each zone is run by its own server, maybe a possibility would be to split up the zone a little more, kind of spreading the load across multiple servers. probably not the most financially efficient way of doing things, but it is going to be a growing problem until it gets fixed.

Although, credit where credit is due, the last few crashes have been resolved extremely quickly, just eliminating the crashes in the first place would be better ;)
 

GReaper

Part of the furniture
Joined
Dec 22, 2003
Messages
1,983
Restricting the number of connections to the auth server should be very easy.

Linux (with iptables) allows you to limit the number of connections over a set amount of time (ie. per minute). Could easily limit it to 100/minute or something.
 

Alliandre

Fledgling Freddie
Joined
Dec 28, 2003
Messages
202
I think someone suggested somewhere else that you'd have to go through a load screen to get onto a different server. This means that if at your home tk's you had to load into the different area (for example, Mids heading to Odin's would have to load out into the frontier as if into a dungeon). This would mean there wasn't a mix of people leveling all over the main land of Mid and also the zerg of albs and mids, with a few hibs trying to take advantage.

I also heard that they'll be doing this with the new frontiers, along with the other changes to them.

From what I can tell, GOA would not be able to do this solution on their own and will have to wait for the RvR patch, while hoping for better equipment to come out.

Take into account that these crashes have ONLY happened over the last week when there's been an exceptional amount of people zerging in Odins. I doubt that the sames happened on the French/German servers recently. It's also not happened on Prydwen. GOA's given very reliable service with the stability of the servers imo.
 

Arindra

Fledgling Freddie
Joined
Jan 5, 2004
Messages
163
GReaper said:
Restricting the number of connections to the auth server should be very easy.

Linux (with iptables) allows you to limit the number of connections over a set amount of time (ie. per minute). Could easily limit it to 100/minute or something.

Easy? Yes.

Useful? Debatable.

Probably already in place? I should think so.
 

Alkoran

One of Freddy's beloved
Joined
Dec 22, 2003
Messages
130
Since all connections (or at least eh large majoriity) to the auth srver are made by people using the DAoC client could modifications not be made here.

I assume it is not possible for the client to easily identify a zone crash as it occurs so the system would have to function for every unexplained disconnection. This means it can't take too much time to function.

How about, in the case of an unexplained termination of a connection the client writes to a local file to state that this has occured. Everytime you attempt to connect the client checks this file. If the last connection was lost unexpectedly then the client makes contact(of the minimal kind) with a remote GOA system of some sort (not the auth server) if this system responds then you begin a normal login and the local file is cleared. If the remote system does not respond then the client inserts a random delay(it's no use turning loads of people away to have them all try again at the same time) and then attempts to contact the AUTH server. In the case of a zone crash the remote system is disconnected to spread the incomming AUTH server connections over a larger timeframe. Crashes of the remote system are possible hense only checking it once. If the remote system isnot functioning correctly people will still be able to connect, but suffer a delay on login until the system is fixed. The most likely cause for the remote system to fail is due to a large number of connections in a short space of time. This is most likely to occur after a zone crash in which case the remote system should already be disconnected. This means that the connection for this remote system does not have to be an expensive one as it is only used by people reconnecting after an unexpected loss of connection and when there are a lot of these it is not supposed to respond anyway.

Like a fuse really it's purpose is to break when a surge occurs so that the rest of the system does not suffer the said surge.

/edit Appologies for bad typing and long explanation.
 

GReaper

Part of the furniture
Joined
Dec 22, 2003
Messages
1,983
Arindra said:
Easy? Yes.

Useful? Debatable.

Probably already in place? I should think so.

Useful? Yes. Would prevent too many people connecting at once causing a Denial of Service, the server wouldn't be overloaded.

Probably already in place? Doubt it. Have you ever had a connection refused due to too many people connecting at once?
 

Users who are viewing this thread

Top Bottom