Thursday 7 November 2013

DBA_SEGMENTS Vs. DBA_EXTENTS for calculating data size

If you are dealing with a single datafile tablespace and you'd need to know the size of a specific chunk of data, you need to use DBA_SEGMENTS!

Here's a simple experiment on tablespaces with 32K Blocks, on ASM and 11.2.0.3, and PARALLEL_DEGREE_POLICY set to MANUAL:

SELECT /*+ PARALLEL(DE,4) */ SUM(BYTES)
FROM DBA_EXTENTS DE
WHERE TABLESPACE_NAME='ACTIX_DATAPART_0001';
-- Result: 8346468352 IN 162.67 SECs

SELECT /*+ PARALLEL(DS,4) */ SUM(BYTES)
FROM DBA_SEGMENTS DS
WHERE TABLESPACE_NAME='ACTIX_DATAPART_0001'
-- Result: 8346468352 IN 3.687 SECs
The SQL statements were run exactly in the same order that are displayed, but to make sure that the caching of the base segments (like SYS_OBJECTS view or any other SYS schema object) is not contributing to the achieved time, shared pool was flushed and SQL statements were run in reverse order. Same results.

The tablespace used has two datafiles, but same statements were also tried on single file tablespaces and again, the same results.

Obviously DBA_EXTENT would allow you to query based on FILE_ID, which -unfortunately- will be crucial when you are interested to know about segments/extents on one of the files. In my case, I need it for a "datafile defragmentation" proc, which requires the BLOCK_ID of extents; hence DBA_EXTENTS.

Side note: The SUM(BYTES) would return the total sum of the size of the blocks that contain data; not the actual size of the data, although I'm not sure whether that matters or not!

Post-publish-newbie-Observation: Just noticed that DBA_SEGMENTS is entirely based on SYS.SYS_DBA_SEGS, whose HEADER_FILE column represents FILE_ID.

Also that the "Bytes" column in DBA_EXTENTS/DBA_SEGMENTS is calculated using SYS_DBA_SEGS based in the following (ugly looking, but time saving!) formula:

decode(bitand(segment_flags, 131072), 131072, blocks,  (decode(bitand(segment_flags, 1), 1, dbms_space_admin.segment_number_blocks(tablespace_id,relative_fno, header_block, segment_type_id, buffer_pool_id,segment_flags,
segment_objd, blocks), blocks)))*blocksize

Regarding the EXTENT_ID: what you need is in sys.x$ktfbue.ktfbuebno; that is, if you need a precise BLOCK_ID, but if -like me, using a 5MB margin for resizing datafiles- a "more-or-less" precise BLOCK_ID does the job, then your answer lies in SYS.SYS_DBA_SEGS.HEADER_BLOCK.

It is either the same as the BLOCK_ID you'd get from sys.x$ktfbue.ktfbuebno and sys.uet$.block# (that is DBA_EXTENTS.BLOCK_ID) or it's only a mere 2 blocks higher!

The king is dead! All hail the king!

Sunday 3 November 2013

Oracle and releasing documentation: Do it, or don't!

Something is fundamentally wrong here!

I was watching a great presentation by Carlos Sierra on Adaptive Cursor Sharing, similar to what he presented in OakTable 2013 that I was present in (Yes, I chose Carlos' talk over Jonathan Lewis' on 12c Histograms! What are you gonna do about it?!), just to remind myself of the critical points made about the subject.

If you have seen the presentation (around 01:05:45+) or have been present at his talk, you'd know that there is a certain point in the presentation that EVERYBODY seems to be confused.

The confusion is about how the ranges of row counts in three different buckets -that define the buckets, are calculated. Carlos has explicitly admitted to the issue in his article"Observations on a test case built on 11.2.0.3 seem to indicate that boundaries for these 3 buckets are: less than 1,000; between 1,000 and 1,000,000; and more than 1,000,000."

Carlos is a person who rightly praises OakTable members using a definition that members -rightly, again- attribute to themselves; i.e. Oracle Scientists. But isn't it a bit strange to call people who study the behaviour of a man-made entity, scientists?! Would you call the greatest mechanics you know, (for example) Ford Scientists?!

And that brings me to my point: I've been privileged to be among the members of this group on the Summer of 2013, and I know that "Scientist" is only one of many titles you could choose to describe them with. But the absurdity of the title arises from the unprofessional behaviour of Orcale itself.

I have absolutely no problem with Oracle keeping its documentation secret; after all they've spent millions of man-hours to develop a capable piece of software, but you'd have to make a choice here: either -for whatever reason you choose to do so- you release ALL your documentation, (likes of which you can find here), or you don't do any of it.

The fact that many professional like OakTable members have to do research to find out about the behaviour of a man-made product (as opposed to a natural phenomena/being) whose creator possesses the information about its inner-workings, can only make an independent observer more intent on judging the creator as "yet another corporation" or -in a best case scenario, "not able to provide an association between the size of the company and the quality of its product," while it can be regarded as an actual player in the field of data science.

The ambiguities such as the ones about the ACS could only bolster the position of those who might ask: Is Oracle releasing its documentation -the way it has- merely a PR move?