Friday 1 December 2017

Cast your NULLs in case you use CASE!

You might tolerate a programming environment which is unable to cast (for example) numbers in string "containers" to numbers, but not being able to cast a NULL value from one datatype to another could drive any Buddhist monk crazy!

So lock your Buddhist monks away, because that can happen in [Oracle] SQL!

However the issue could manifest itself in situations not worthy of being known as "best practice!"

(Wow! Three exclamation marks already! Well, five including this sentence. This post better be good.)

You stored your Dates as WHAT?!


You might remember Richard Foote's articles (1, 2, and 3) on the consequences of storing Date values as Numbers or Varchars.

This is another example of the consequences of such bad practices. I recently came across this example, which made this post happen.

SQL>WITH RND AS
(
    SELECT
        CASE
            WHEN DBMS_RANDOM.VALUE(1, 100) <= 50
                THEN 0
            ELSE
                20171009
        END NUM
    FROM DUAL
    CONNECT BY LEVEL < 10
)
SELECT
    CASE
        WHEN NUM = 0 THEN DECODE(NUM,0,NULL,TO_DATE(TO_CHAR(NUM), 'YYYYMMDD'))
        ELSE
            SYSDATE
    END N
FROM RND;
The "RND" factored subquery represents a horizontal-and-vertical slice of a staging table. Dates were stored as Numbers and if not present, stored as 0!

During the ETL the "date" value was merged into a table considering an existing "Date" value from another table (presented by SYSDATE), using a CASE expression. The main SELECT statement represents an equivalent of that MERGE.

Note: Do not take the "WHEN NUM = 0" literally. It's just a simple/silly replacement for a meaningful predicate.

This is the result:
SQL Error: ORA-00932: inconsistent datatypes: expected CHAR got DATE
00932. 00000 -  "inconsistent datatypes: expected %s got %s"

It's too confusing for Oracle, too!


It takes a couple of re-writes to understand that the "DECODE" is not the culprit here.
SQL>WITH RND AS
(
    SELECT
        CASE
            WHEN DBMS_RANDOM.VALUE(1, 100) <= 50
                THEN 0
            ELSE
                20171009
        END NUM
    FROM DUAL
    CONNECT BY LEVEL < 10
)
SELECT
    DECODE(NUM,0,NULL,TO_DATE(TO_CHAR(NUM), 'YYYYMMDD')) AS N
FROM RND;

N                
------------------
09-OCT-17

09-OCT-17

09-OCT-17
09-OCT-17
09-OCT-17

09-OCT-17

9 rows selected.
As you can see the DECODE function has no problems handling the NULL value alongside a value of type Date.

"In case you use CASE"


You don't need to be Bryn Llewelyn to infer that the CASE expression seems to be trying to make sure that the result of all of its conditions have matching data types.

In this case, it's really easy for Oracle to find a reference point. SYSDATE is easily revealing the expected datatype, but if "NUM = 0" the output of the DECODE function would be NULL.

What is the intrinsic data type of NULL?


Just to expose mysel to humour, I'd say: NULL!

My completely uneducated (and somewhat silly) guess is that at some point during interpretation of the CASE expression Oracle must be doing a comparison between the output datatypes of different conditions in the CASE expression and (if you're living in fantasy-land and imagine that Oracle will still be using SQL for that!) it will run a statement like this:

SQL>SELECT
        CASE
            WHEN 'DATE' = NULL THEN 'DATE'
            ELSE 'NOT DATE'
        END
    "DATE = NULL?"
FROM
    dual;

DATE = N?
--------
NOT DATE

Why is DECODE not guilty?


Because of the ridiculous "TO_DATE(NULL)" in the statement below:
SQL>WITH RND AS
(
    SELECT
        CASE
            WHEN DBMS_RANDOM.VALUE(1, 100) <= 50
                THEN 0
            ELSE
                20171009
        END NUM
    FROM DUAL
    CONNECT BY LEVEL < 10
)
SELECT
    CASE
        WHEN NUM = 0 THEN DECODE(NUM,0,TO_DATE(NULL),TO_DATE(TO_CHAR(NUM), 'YYYYMMDD'))
        ELSE
            SYSDATE
    END AS N
FROM RND;

N       
---------
30-NOV-17
30-NOV-17
30-NOV-17




30-NOV-17
30-NOV-17

9 rows selected.

DECODE is not the same as CASE


This is one of the two points which makes me not fully agree with a statement Steven Feuerstein made in one of his recent articles (go to "Hey, What about DECODE?" at the bottom).

DECODE behaves differently than CASE expression, specifically when it comes to dealing with NULL values.
SQL>WITH test_values AS
(
    SELECT A, b
    FROM
    (  
        SELECT 1 A FROM dual
        UNION
        SELECT NULL A FROM dual
    )
    CROSS JOIN
    (
        SELECT 1 b FROM dual
        UNION
        SELECT NULL b FROM dual
    )
)
SELECT
    A,
    b,
        CASE
            WHEN A = b THEN 'EQUAL'
            ELSE 'NOT EQUAL'
        END AS "CASE equality test",
       DECODE(A,b,'EQUAL','NOT EQUAL') AS "DECODE equality test"
FROM
    test_values;

A     B     CASE equality test    DECODE equality test
----- ----- --------------------- --------------------
1     1     EQUAL                 EQUAL             
1     null  NOT EQUAL             NOT EQUAL         
null  1     NOT EQUAL             NOT EQUAL         
null  null  NOT EQUAL             EQUAL            
What we saw in this post had to do with casting data types, but another thing which makes me (sometimes - especially when the number of conditions are one or two) choose the geeky, yet concise DECODE function over the human-readable CASE expression is the fact that DECODE is capable of comparing NULL values (and not only doing an equality test) and inferring TRUE.

Friday 22 September 2017

On the burden of unjustifiable human interfaces

"If something doesn't have an API, it doesn't exist!"

For me, that quote form Mitchell Hashimoto represents one idea: If you don't automate access to resources, you'll be extinct.

It's not about cloud, containers, microservices, serverless technologies, or any other 2017 buzzword that you can come up with.

Those are not the aim. Those are simply tools to help you free up access to resources, to help things move faster.

Below is simply an example (rather specific, albeit) of how inaccessibility of resources can reduce productivity.

1. The problem

 

1.1. The context


A "Trade" data warehouse, running on Oracle 11g database. Moved to Exadata recently.

Multiple types of financial instruments, multiple types of trades, years of data [sub]partitioned daily.

Example of oddities: One of the (badly designed, in my opinion) "dimension" tables holds tens of millions of rows!

Countless number of hints in every SQL statement to make hundreds (if not thousands) of queries per hour work on the legacy (i.e. non-Exadata) system.

Prospects of changing the data model: almost zero.

State of Preprod/UAT DB: much less volume of data, not receiving as much data as the Prod by a large margin.

State of Dev DB: Almost non-existent.

The "DBA" sits in a completely different environment. Might vaguely know about the data model. It's -supposedly- a big corporation.

1.2. The daily workflow

 

Customer complains about queries running slow. The complaint is sent to Dev.

Dev are the people who have written the SQL statements. They know about the data model, with good knowledge of Oracle internals. Dev has no access to Cloud Control or DBA workflow; for example no idea when the backups run last, or if it was running at the time of an incident.

DBA has no visibility over the complaints or DB performance. And there's no point in sending the complaint to the DBA: He can't qualify a SQL statement as "problematic" or "OK," if the query is taking too long to complete, because he doesn't know the data model.

1.3. The example scenario

 

An issue is raised. Dev checks the SQL statement.

As a matter of principle/process runs the statement on PreProd. PreProd doesn't have the same characteristics of the Prod, so the issue doesn't always reappear.

Dev is aware of the qualities of the PreProd DB. Also the customer's not delusional; something IS running slow.

Just imagine that when it comes to executing a join on (e.g.) 10 different row sources, CBO had (twice) chosen to do Nested Loops and tens of thousands of Range Index Scans on one of the massive tables because of incorrect stats on the driving table. Can't get clearer than that, can we?

Trying to establish whether the behavior can re-occur (are we actually dealing with a systematic issue? or just a one-off anomaly?) Dev re-runs the statement on Prod. CBO chooses a "bad" plan, query runs like a dog, can't cancel the query on SQL Developer.

Obviously something's wrong. Any "fix" should be re-examined.

Including the re-examination (if successful, that is) the query has run 3 times on Prod.

SQL Developer is still hanging.

Just a reminder that breaching SLAs in financial sector can lead to fines of millions of pounds by the regulator, at least!

This has to be fixed quickly.

Dev (the same person who understands the data model, knows what's running on his SQL Developer session, and most probably knows why it's taking too long) asks the DBA to kill the session.

The DBA (not familiar with the data model, unaware of the activity): "Could you raise a ticket, please?"

Tick, Tock, Tick, Tock, Tick, Tock...

2. Red Lines and questions

 

2.1. Prod != PreProd

 

In my opinion that's the root cause of the whole issue.

If you're big enough to be in a sector with SLAs which puts you in danger of loosing millions of pounds in fines, you should be big enough to have the EXACT SAME environment in PreProd, as in Prod. If you don't, you're embarrassing yourself.

If you're not a software vendor and your organisation effectively owns the production data, perhaps you should use Data Virtualisation tools such as Delphix.

However, this factor is simply an easy target to criticize; especially if you don't have to deal with it from a pragmatic angel (because you'll not suffer the consequences of missing the SLA, for starters).

If you're not an architect of this system, or you're brought it long after the date that system has become operational, you have to make lemonade using the lemons that life has given you.

2.2. Auditing for the sake of auditing

 

  • Why are "kill this session, please" tickets raised? Are we simply trying to serialize DBA's workload and avoid him getting overwhelmed by a flood of requests every minute? (If yes, perhaps he's controlling too many resources.)

  • Has a "regulator" asked you to do this? If not, is there a clear process of review?
  • What is the value in raising those tickets? 
  • What's the action plan for reviewing those ticket? 
  • Can you set up a down-stream automated process which processes the raised tickets for later analysis? 
  • Is the underlying data in your ticketing system accessible in a programmable way? 
  • Would you be able to chart/categorize your tickets easily? 
  • Can you use your ticketing system to produce quantifiable indexes to measure trends in your organization?
  • How long do you keep those tickets in your system? Why? 
  • How do you ensure that such ticket won't have the same fate as billions of tourist photos which are taken every day which won't be EVER revisited again?

2.3. Role of DBA

 

  • What kind of DBA deals with such tickets? Is he/she a DR/HA DBA? If they're not application DBAs (who's familiar with data/business model), why not?
  • What is the process of communicating with DBA? How many DBs he's responsible for? How many teams/products/services is he responsible for? Does Dev know about his responsibilities? Is he part of the team?
  • How does he contribute to [increasing] productivity? Has he turned to a bottleneck? If yes, why? Too much responsibilities? Is he not eager to be part of the product?

2.4. The value of involving DBA in menial tasks

 

  • Why should the DBA deal with such tickets? What's the difference, if Dev does it and it's logged?

If you are a DBA who knows that menial tasks can be automated AND logged in a way that satisfies "process" requirements and you still choose to receive tickets, perhaps you SHOULD feel threatened by the notion of "autonomous DBs"...

The world has changed, everybody is automating menial tasks, anyway!

2.5. The value of being involved in "non-menial" tasks

 

  • If you're the only person who's privileged to (e.g.) deploy a package, would you be held responsible, if the package causes data loss? If Dev will eventually be held responsible, then why Dev is not deploying the code?
  • Do you -the DBA- know how use tools such as Jenkins to enable the Dev do the deployments in a safe manner? Do you want to enable Dev to move things faster in a safe manner? or you feel too proud of being the only person who does the deployments?
  • Do you -the DBA- do code reviews? Do you clearly know what does the code that you're deploying do? If not, then why are you deploying it,  anyway?!

Conclusion

 

I'm sure all of us can see the problems hidden in the answers to some of the questions above. Obviously the elephant in the room is the segregation of DBA and the development process.

But if you can't embed the DBA in your development team, my solution is "Provide an API for your resources!"

I'm not sure how many of us can justify the delays caused by having a human interface for a resource which doesn't need one. In my opinion the notion of "raising a ticket will only take two minutes" is simply not true, at least because if the DBA is not busy to be able deal with his incoming notifications immediately (is he/she EVER?!) it would require another two minutes to deal with the issue, and another two minutes to reflect the outcome in the ticket, which is subject to all the questions I mentioned in 2.2.

This means your Dev has had an opportunity to distract himself either with another task, or Instagram! Context/task switching in humans is an extremely costly factor in production, and it's by no means as efficient as context switching in CPUs.

That applies to DBAs too (surprise!) since they're not sent from above. When DBA receives tens of notifications every minute for requests which can be automated, he/she have to deal with context switching as well.

Are your reasons for keeping the human interfaces to your resources justified? When was the last time you did a soul searching and self examination to re-examine your reasons?