Sunday, April 08, 2007

DATE comparisons: the scenic route

Need to calculate the number of minutes between two dates? Yes, the dull way is to subtract one from the other and multiply by 1440. But why do that, when you can simply convert each date into Julian format by converting it into a string and then back to a date, and converting the resulting date into a Julian string, and (in a separate step) applying TO_NUMBER to the result to get an integer, so that you can simply subtract one from the other and multiply by 1440.

Except that rounds to the nearest day, so an additional step is to calculate the minutes since midnight for each of the two dates, which you can do by simply converting each date into an 'HH:MI AM' string, then back to a date, then back to an 'SSSSS' string, converting it to a number, dividing by 60 and adding it to the result of the first calculation. I think.

Anyway, consider the following library of handy date functions our Oracle WTF Easter gift to you, the online development community.

CREATE PACKAGE dates_pkg
AS
    FUNCTION julian_date
        ( date_to_convert DATE )
        RETURN NUMBER;

    FUNCTION minutes_since_midnight
        ( timevalue DATE )
        RETURN NUMBER;

    FUNCTION minutes_elapsed
        ( lowdate DATE
        , highdate DATE )
        RETURN NUMBER;

END dates_pkg;
/

CREATE PACKAGE BODY dates_pkg
AS
    FUNCTION julian_date
        ( date_to_convert DATE)
        RETURN NUMBER
    IS
        varch_value VARCHAR (10);
        num_value NUMBER (20);
    BEGIN
        SELECT TO_CHAR
               ( TO_DATE(TO_CHAR(date_to_convert,'MM/DD/YYYY'),'MM/DD/YYYY')
               , 'J')
        INTO   varch_value
        FROM   dual;

        SELECT TO_NUMBER (varch_value)
        INTO   num_value
        FROM   dual;

        RETURN (num_value);
    END julian_date;


    FUNCTION minutes_since_midnight (
        timevalue DATE)
        RETURN NUMBER
    IS
        secs_elapsed NUMBER (20);
        mins_elapsed NUMBER (20);
    BEGIN
        SELECT TO_NUMBER
               ( TO_CHAR(TO_DATE(TO_CHAR(timevalue,'HH:MI AM'),'HH:MI AM')
               , 'SSSSS') )
        INTO   secs_elapsed
        FROM   dual;

        SELECT (secs_elapsed / 60)
        INTO   mins_elapsed
        FROM   dual;

        RETURN (mins_elapsed);
    END minutes_since_midnight;


    FUNCTION minutes_elapsed
        ( lowdate DATE
        , highdate DATE )
        RETURN NUMBER
    IS
        final_number NUMBER (20);
        low_julian NUMBER (20);
        high_julian NUMBER (20);
        num_days NUMBER (20);
        num_minutes NUMBER (20);
        temp_mins NUMBER (20);
        min_low NUMBER (20);
        min_high NUMBER (20);
    BEGIN
        SELECT julian_date (lowdate)
        INTO   low_julian
        FROM   dual;

        SELECT julian_date (highdate)
        INTO   high_julian
        FROM   dual;

        SELECT (high_julian - low_julian)
        INTO   num_days
        FROM   dual;

        SELECT (num_days * 1440)
        INTO   num_minutes
        FROM   dual;

        SELECT minutes_since_midnight (lowdate)
        INTO   min_low
        FROM   dual;

        SELECT minutes_since_midnight (highdate)
        INTO   min_high
        FROM   dual;

        SELECT (min_high - min_low)
        INTO   temp_mins
        FROM   dual;

        SELECT (num_minutes + temp_mins)
        INTO   final_number
        FROM   dual;

        RETURN (final_number);

    END minutes_elapsed;
END dates_pkg;

Just for fun, let's test it:

CREATE TABLE wtf_test (start_date NOT NULL, end_date NOT NULL) AS
SELECT DATE '2006-12-25' + DBMS_RANDOM.VALUE(1,365)
     , DATE '2007-12-25' + DBMS_RANDOM.VALUE(1,365)
FROM   dual CONNECT BY LEVEL <= 1000;

-- ...several runs here to allow for caching etc, last set of results shown...

SQL> set timing on autotrace traceonly stat

SQL> SELECT dates_pkg.minutes_elapsed(start_date,end_date) FROM wtf_test;

1000 rows selected.

Elapsed: 00:00:03.96

Statistics
----------------------------------------------------------
  16000  recursive calls
      0  db block gets
     74  consistent gets
      0  physical reads
      0  redo size
   9330  bytes sent via SQL*Net to client
    809  bytes received via SQL*Net from client
     68  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
   1000  rows processed

SQL> SELECT (end_date - start_date) * 1440 FROM wtf_test;

1000 rows selected.

Elapsed: 00:00:00.16

Statistics
----------------------------------------------------------
      0  recursive calls
      0  db block gets
     74  consistent gets
      0  physical reads
      0  redo size
  25485  bytes sent via SQL*Net to client
    809  bytes received via SQL*Net from client
     68  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
   1000  rows processed

So the handy package version takes 25 times as long as the 1-line SQL version.

And in the interests of fairness, in case you're thinking perhaps that is just the normal overhead of calling PL/SQL functions in SQL, let's try our own function:

CREATE FUNCTION minutes_elapsed
    ( lowdate DATE
    , highdate DATE )
    RETURN NUMBER
AS
BEGIN
    RETURN (highdate - lowdate) * 1440;
END minutes_elapsed;
/

SQL> SELECT minutes_elapsed(start_date,end_date) FROM wtf_test;

1000 rows selected.

Elapsed: 00:00:00.26

Statistics
----------------------------------------------------------
     21  recursive calls
      0  db block gets
    107  consistent gets
      0  physical reads
      0  redo size
  25496  bytes sent via SQL*Net to client
    809  bytes received via SQL*Net from client
     68  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
   1000  rows processed

Still 15 times faster.

Many thanks to Padders for sharing this one.

3 comments:

Maxim said...

Well, i think it is still not completely fair in regard plsql.
And i don't agree, to call plsql from within sql is evil - sometimes it is best possible solution. Pl sql (equally as java or python) have right to their existance. Evil are context switches (and this is eqally truth for both - plsql from sql and sql from plsql). So, if one has a need to switch to pl sql solution (the need is not very obvious in this case), he should try to minimize/consolidate sql calls). In this package no real data is processed, the pure math should be done, here pl sql is by all means not so bad. Simply by changing all
select to_char(...whatever) into var from blabla;
into
var := to_char(...whatever);
i got reduced the time to 0.24s and recursive calls to 62 ( from over 16000). Here are two runs


SQL> SELECT dates_pkg_plsql.minutes_elapsed(start_date,end_date) FROM wtf_test;

1000 rows selected.

Elapsed: 00:00:00.24

Statistics
----------------------------------------------------------
62 recursive calls
0 db block gets
97 consistent gets
3 physical reads
0 redo size
15116 bytes sent via SQL*Net to client
1126 bytes received via SQL*Net from client
68 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
1000 rows processed

SQL> SELECT dates_pkg.minutes_elapsed(start_date,end_date) FROM wtf_test;

1000 rows selected.

Elapsed: 00:00:02.58

Statistics
----------------------------------------------------------
16063 recursive calls
0 db block gets
122 consistent gets
0 physical reads
0 redo size
15110 bytes sent via SQL*Net to client
1126 bytes received via SQL*Net from client
68 SQL*Net roundtrips to/from client
9 sorts (memory)
0 sorts (disk)
1000 rows processed


Best regards

Maxim

William Robertson said...

Yes, I already did that test using a standalone "minutes_elapsed" function. I could also have modified dates_pkg and got similar results, but I wanted to keep the demo readable. I'm not sure what point you think I was making.

Maxim said...

Sorry, then i misunderstood your intention. Still, see a lot of plsql code around where nearly every variable assignment is done by select into instead of plsql assignment - many developers simply don't realize the fact, that former is a context switch, very sad...

Best regards

Maxim