Friday, March 20, 2009

Rollback segments explained

I recently read this in a book about data warehousing:

Source System Rollback Segments

When extracting from a relational source, extracts that take a long time can be problematic. If an extract asks for all records updated in the last 24 hours, the system must locate the appropriate set of records. This means that no user can change the updated_date field while your query is being processed. As transactions flow in during your query, they are queued up in a separate place called a rollback segment, to be applied once your request is finished. If your query takes too long, this queue gets too large and runs out of space. The system then kills your job and processes the transactions that have been queued up. In general, the folks responsible for the transaction system don't like this kind of behavior.

Now to be fair, the book was published in 1998 and is not specifically about Oracle. Does anyone know whether there has ever been a commercial DBMS that worked anything like this, or did they just make it up?


Flado said...

Weeell... If you take a black box approach, looking from your own session, it sure appears to work a bit like that in Oracle. I can imagine the reasoning going like that:
1. Your query does not see any changes made since it started.
2. Changes are not prevented (or "the folks responsible for the transaction system" would be much more upset), so they do happen while your query runs.
=> changes are neither prevented nor applied, ergo they must be stored somewhere until your query finishes.
3. Sometimes, esp. if too many concurrent changes happen while your query runs, it gets aborted with ORA-1555 saying that something called a rollback segment is too small
=> a-ha! that's where the changes are stored!
4. Immediately after getting an ORA-1555, your session sees the changes.
=> the queued changes were applied as soon as your query "released" the underlying data.

Actually, a pretty sensible hypothesis, that is until you notice that other sessions see all committed changes (not just their own) while your query still runs.
Really, given the amount of information available on Oracle internals back in 1998, I could've made that error myself. Maybe.

Noons said...

Been working with commercial DBMS systems since 1980 and quite frankly: still have to see one work like that.
Relational or not.
1998, eh? You been dusting the old book cabinet?

Anonymous said...

Until recently SQL Server and Sybase. And AFAIK Foxpro to this day still suffer from blocking reads.

Not sure that this ever had anything to do with anything called a "rollback segment".

But Sybase IQ in particular had such basic locking that it was necessary to serialise at the connection level since only one "updating" user was allowed at any given moment. :)

silentd said...

"This means that no user can change the updated_date field while your query is being processed."

So it's OK if users change other fields while your query is running, as long as that pesky updated_date field remains unchanged.

Anonymous said...

The scary part is how easy it was to figure out which book you were talking about, and then find Ralph's email. So why don't you ask him?

word: reaterse

William Robertson said...

@Anonymous#1: the "readers block writers" DBMSs simply lock the data being read, so that is not quite the mechanism being described.

@Anonymous#2: well spotted ;)

@Noons: actually I bought the book in 2006. I've just noticed a new edition appeared in 2008. That's it, my copy is going in the bin.

Anonymous said...

Hi .... which book is that? We might get a good laugh ...

John T said...

If it is for Oracle, they have it close, but with some huge errors.

Oracle will do read consistency, meaning you get a snapshot of the data as it was at the start of your query. If a record is changed after the start of your query, Oracle will read data from the rollback segments. This is to ensure you are getting the data as it was at the start of your query. Also, Oracle does this so it does not have to do blocking locks on updates during the execution of the query. Other sessions can continue to update the data unhindered.

A transactional system will use rollback segments in roughly a round robin sort of fashion. In this case, the rollback segments are marked as available once the transaction commits. (Keep in mind, Oracle by default reads committed data only, so it is only going to the rollback segments if there has been a committed transaction on your data since the beginning of your query) So if the transaction activity loops through all the rollback segments, and your query is dependent on one of those, your query will fail with a ORA-1555 Snapshot too old error.

LSH Expert said...

Can you explain, how this rollback process is involved in applications like Oracle LSH