How our users exploited concurrency and how we fixed it

A story of a game exploit

Once upon a time I developed a somewhat popular web game called Forumwarz. At its peak, we were serving about 6 million dynamic requests a day off a single quad-core server.

Forumwarz limits how many turns a player can take in a day. We designed it this way so that the competitive aspect of the game wasn’t simply a contest of who had the most time available to play. Players had to choose their targets wisely.

A side effect of having limited turns was we had a good idea of the maximum score a player could earn in a day. However, a few weeks after the game started to become popular, we noticed some people were earning many multiples more than they should have been able to. Somehow they had figured out a way to exploit the game and it was giving them a huge advantage on our leader boards!

I reviewed the history of a couple of the offending players. The curious thing I discovered was that they weren’t taking any more turns than the others; they were just being rewarded much more.

There was only spot in the codebase that rewarded players, and it looked something like this:

# Keep everything in a database transaction
Player.transaction do
 # Find the player's current goal
 goal = player.goal
 # Make sure we don't reward goals that have been already been completed
 unless goal.completed?
 goal.update_column :completed, true
 player.increment!(:score, goal.score)
 end
end

After much headdesking, I eventually discovered that the above code is not safe under concurrency.

In Rails’ development mode on your local machine, the code will always work properly. This is why the exploit never came up while I was testing the game. However, if you run it in a concurrent environment (such as our quad-core server) players can reward themselves multiple times by making many requests in a very short period of time.

This happened to by what our players were doing. They discovered that if they submitted their final blow many times quickly, they’d be rewarded more than once.

Why does the code fail?

If you step through the above code with multiple processes in mind, the error is fairly obvious. Let’s say two requests come in to your web server at the same time. Both will reach this line:

goal = player.goal

which will trigger a database query that looks like this:

SELECT goals.* FROM goals WHERE goals.player_id = 1234`

At this point, the database will return the same values for both queries, as the update statement hasn’t happened yet. This means that when we reach this line:

unless goal.completed?

Both of them will pass! The player score will be updated twice.

The Solution

After I discovered the cause of the exploit, I was a little worried. I’ll be the first to admit I am green behind the ears when it comes to multithreading and concurrency. I was worried that I’d have to resort to semaphores or some kind of other exotic programming construct.

Fortunately, I soon found an easy solution: let the database handle the concurrency. Much smarter developers than me have put in thousands of hours of work into databases to make sure they hold up under concurrent situations such as these. All I’d have to do is leverage their hard work.

Here’s the solution I came up with:

Player.transaction do
 # Update completed attribute to true, but only when it's currently false
 row_count = Goal.update_all "completed = true", ["player_id = ? AND completed = false", player.id]
 # update the player score only if completed changed in the database
 if row_count == 1 
 player.increment!(:score, goal.score)
 end
end

The key to the above solution is that your RDBMS will return a count of how many rows it changes when you execute an UPDATE. Only one request will receive a row count of 1 back. All others will receive 0 and will execute nothing. It just works!

I highly recommend using the above approach any time you want to trigger a secondary effect following a successful update. It’s easy to write and your application will be a lot safer.


Imported from: http://eviltrout.com/2013/01/10/exploiting-concurrency.html

Interesting read, thanks.

It's not the "quad-core server" that introduces this race condition -- even if you only had a single server with a single core, as long as you were running more than one rails process (eg passenger maxprocesses greater than 1 which is default) and/or allowing "concurrent request handling" in your rails app (which is not default and nobody ever does these days but might be default in rails4), the db-access-based race condition would be there. Yep, even with a single core.

Indeed database transactions are a good solution to the db-access-based race condition. Any solution is going to involve the db -- some form of 'optimistic locking' would possibly be another one, that may scale better. http://api.rubyonrails.org/cla...

What database are you using? MySQL?

Assuming you are using mysql, this works because an 'update' statement in mysql (even though it contains a read and a write internally) is considered 'atomic'.

Had a similar issue with a small game I once developed in Java. Both gravity and the player were trying to change the velocity of the spacecraft, but due to issues with multiple threads acting upon these two variables concurrently, it wasn't working correctly.

A solution that worked for me was to use a test-and-set mechanism with Java's own ConcurrentHashMap, which assured linearity of the key/value pair.

Question: Why not just put a stored procedure in the database? Something like:

WITH PlayerGoal
AS
(
SELECT *
FROM tbl_Goal go WITH (UPDLOCK, ROWLOCK)
INNER JOIN tbl_Player pl on pl.PlayerID = go.PlayerID AND go.PlayerID = @PlayerID
WHERE go.Completed = false
)
UPDATE PlayerGoal
SET Score = Score +1
Completed = true

Your solution looks like it takes two trips to the database, and now you have to think about database things in your code, instead of just throwing some things over the wall.

In my experience, stored procedures, triggers, views, etc, are not very common in the world of MySQL-as-used-by-engineers. I think a lot of that comes from a tradition of "MySQL is very fast and pretty good as long as you keep it happy" and keeping it happy was defined as "don't do anything fancy on the database".

There's also something to be said for processing on your application servers, since you probably have N of those and only M databases, and N is usually way higher than M and far easier to increase. (Not that the example in this particular case is really intensive, there are certainly things that could be.)

Transactions are the de facto way to do this, especially in financial systems...which is my guess as to where the term transaction (in SQL context) comes from.

Looks like what's happening is a dirty read. You should use a different transaction isolation mode, not Read Uncommitted.

http://www.postgresql.org/docs...

As others have said the solution is simply choose the correct isolation level for this unit of work. I'm not as experienced with Ruby but in c# when you create a transaction object to wrap around your ORM code you can specify the the transaction level for the block of work.

The other lesson that people should take from this is the use of ORM is great (I use them for all my data access code) but it doesn't remove the need to understand SQL and Database engines. You always need to keep in mind what SQL will be generated by the ORM tool. It can bite you in the ass.

There is SELECT ... FOR UPDATE statement for this and in rails .lock(true) does this for you. I.e. player.goal.lock(true) should do this. "For update" issues read lock on the column for other transactions until this one completes. I'm using this in production in our queueing api and it didn't fail me ever. In your case, that single update statement is better, but in many cases it can't be handled by one update. E.g. touching multiple tables (bank transactions, etc.).

*read lock for that row (in innodb).

It's a shame that `transaction do ... end` was useless here. It seems to me that "Serializable" should be the default transaction isolation mode. That's what a transaction _is_.

Right. The default isolation level in Postgres is "read committed" (same as Oracle and MS SQL Server), which means any queries you issue at the beginning of a transaction basically aren't really inside the transaction. You are racing against other users when you rely on the results of those queries.

The problem has nothing to do with Rails.

A better solution is to have a separate goal_completions table. Put a unique index on goal_completions.goal_id, and you've ensured that a goal can't be completed more than once. Use a trigger or after_create for updating the score.

Excellent article on the subtlety of concurrency.

MongoDB, having no transactions, has an interesting (and atomic!) approach to this: update-if-not-different. Single assignment / increment / list append/delete / dictionary key addition/removal operations can be executed conditionally. Even better, if you really only have a single operation to perform (i.e. the player score increment as above) you can perform that update *asynchronously*. Multiple extraneous and simultaneous calls in the event a user tries to perform the above exploit will be silently and naturally ignored, costing the application server basically nothing and ensuring a speedy UX.

Now, for one of the web-based games I've worked on in the past we also avoided immediately updating player scores (health, energy, [x items], etc.) and instead deferred the operation to the next whole save of the user record. We basically built a transactional system for scoring on top of Mongo; score change operations were queued until ratified by a larger-scale User record save operation and before that point could be audited. (We generally ratified this on a per-user rotating schedule or when too many score transactions had built up.) Additionally, since the energy score was always incrementing at a constant (time-derived) rate it became trivial to not bother Mongo with updating the value; only when energy was consumed was that value updated. Before that the value was calculated from the last update time of the User record and their 'last known' energy level.

All in the name of performance. ;)

Some frameworks break Postgres' natural behaviour; Django (yeah, yeah, Python), for example, issues all statements within a per-request transaction… and has almost no way to efficiently utilize replication slaves unless you use a proxy which ignores transaction start instructions until a non-select is issued. Django's ORM is terrible, though.