Life and day job have been hitting pretty hard lately but I've slowly been chipping away at my network library. This weekend I spent some hard code time trying to relearn what I'd done as it's quite complicated. Fortunately, I left myself lots of notes.
Still, it's left me with some design decisions to make so there was lots of mental pacing. The prototype was heavily based on only what I'd need for Mythruna and the new one will be more general and open source... and more better. But it requires some rethinking of some fundamental things.
Also, the design of the 'reliability' portion changed even over the course of the prototype and based on that some things can be renamed and simplified. Today in 4 hours of banging my head against it, I discovered even more things that can be cleaned up. The process is kind of interesting sometimes:
"Gee, how can that work?"
"Ack, there's no way that works it's got to be a bug..."
...frantic diagramming and scribbling...
"Ahah, no this other thing makes it work ok so I did handle it before..."
"Man, there must be a better way to do this so I don't fall into that rabbit hole next time..."
Sometimes I was even nice enough to leave myself comments along the lines of "if you think this is broken, here is why it probably isn't..."
Still, those are all signs of areas that need to be cleaned up. Confusing code has a bad smell.
One would think this sort of thing can't be too difficult... but it gets quite complicated. I mean, sending network state is relatively straight forward... it's the sending it efficiently part that will drive you up a tree. Throw in that you are using a fast but unreliable style of network messaging and things get even more complicated.
For example, if I was sending packets reliably then I could just send only the values that changed since the last message. The client was guaranteed to get the last message so no problem. This is a double edged sword, though, because ALL messages are guaranteed to get there. Even the ones that are now obsolete. This naive approach can end up with network hiccups causing noticeable (several second) pauses and suddenly all of the objects rapidly go through the history they missed.
On the other hand, sending messages unreliably is faster right out of the gate... but the messages may not ever get there or they may arrive all out of order. You could just blast the full state of every object every time... that's quite common... and hope for the best. Each new message makes each previous one kind of obsolete anyway.
The thing is, that turns out to be a lot of data to send and anyway for proper visualization you want some amount of history, too.
I kind of take the best of both worlds in that I send only what has changed but I send it over fast and unreliable transports. This does mean that I need to handle my own reliability on some level and that's where things get a little complicated.
Here is a glimpse of what I mean:
On the client and the server (per player) we keep a sort of local view of what the world looks like. For each object that we are tracking, we keep a baseline state and the actual state.
When communicating with the client, we only send the differences between the baseline and the actual state. For example, if the object hasn't moved but has rotated then we only send rotation. (There are more values then that: zoneId, objectId, and so on that are all tracked in this way.)
This works as long as the client and the server (per player) have a common idea of what the baseline is. It's super vital.
Each message that goes out gets a sequence number. The client sends an "acknowledged" message (ACK) for each sequence number that it saw. So the server can know that "ahah, the client knows this much state..."
But hold on a sec, what if the server never got that message? The client can't update its own baseline until it knows the server knows it knows what it knows. Fun, eh?
Each message that the server sends the client also includes a list of the ACK messages that it has received. It's a double-ACK.
So, server sends msg 1, msg 2, msg 3...
Client sees msg 2 and send an ACK for 2.
Server sees that ACK and starts sending that as part of its subsequent messages... and its new baseline is now based on that state it knows the client already got.
Client sees msg 4 and knows the server got its ACK for 2... it can update its own baseline based on that... now they are in sync.
...client sends an ACK for message 4 just like always.
Server sees the ACK for message 4 and now knows that any of the double-ACKs it has been sending to the client in every message can now be removed.
The key is that the double-ACK list is sent with the new object state. It gives the client something it can be sure of. It's complicated enough that I had to recheck part of it as I was typing the above.
Generally, if the network connection is running well then there will never be any more than one or two 'acks' in that header. Since the header contains all of the client ACKs, any one acknowledged message means we can clear all of them out (except any new ones since that message). And chances are, if messages are being dropped heavily in one direction then they are being dropped heavily in the other, too, so it's not like the list will grow too quickly.
Anyway, today was family day but the entire time I'd been working over these design ideas in my head. Took the kids to see "Age of Ultron".. during the lulls I'm thinking about how to rename all of the different classes that have the word "State" in them... there's like 15 of them. Grilling steaks, I'm trying to figure out how to make the object protocol more configurable. Roasting marshmallows over the fire pit, I'm trying to keep myself from drifting off but still thinking about if the object delta stuff can be separated out... and so on.
It was a good day, though. Everyone went to bed and I was ready to tear into design... I've spent the last 4-5 hours just writing things up. I feel pretty good about it... so it must be time to go to sleep and have nice dreams before the light of day shows me my mistakes.