Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

BOX - beginning of expedition
EOX - end of expedition

The DBA is responsible for shore EOX processing.

...

This expedition is ended. Within two weeks, science participants expect to have moratorium access to LIMS content. DBA, Systems/MCS, and sometimes Programmers need to coordinate to help this happen. The Database Librarian needs to know when these changes occur too.

Make sure we are getting as a complete a copy of the shipboard data to be warehoused on shore as we can.

0. Notify that the data load is in progress

...

  • What database instance are you connecting to? You must be at a command-line for that ODA instance.
  • As the OS oracle user specify which database instance you are using with this syntax

    Code Block
    . oraenvORACLEoraenv
    ORACLE_SID = [oracle] ? LIMSHQ (or RTLIMS or LIMS19)


...

  • This script imported 398 content over multiple tests between 30 and 40 minutes.
    Additional data review scripts take another 10-15 minutes to run.
  • The script requires no TRANSFER schema preparation--all the tables required are present, the import methodology truncates the tables before importing new content.
  • The content is imported directly from SCHEMAS exported on ship (LIMS, DESCINFO2, OPS, GEODCAT, GEODP###, etc.).
    There is no duplication of keys incurred by the ETL cycle now in use.

3. Establish moratorium credentials and controls

For the new expedition data. - DBA

Pick a name and credential. The account is shared by a science party for the moratorium period. The role created with the same name ensures that science party is granted the exception to see its moratorium data.

eox_gen_in_moratorium_script.ksh [moratorium-group]

This script accepts a moratorium name, e.g. SHACKLETON, SANTORINI, etc.

It generates another script (below) that will be run to modify NEW_TEST, NEW_RESULT, and NEW_GEOD_DATA tables for moratorium concerns.

  • If a moratorium group is not provided, the script will prompt.
  • The script does not check these requirement
    • The GROUP_NAME field that this string populates is varchar2(10)--10 characters or less. It is conventional to use a memorable uppercase name. Often the EPMs will provide both a name and a credential.
    • The string must also be a valid name for a ROLE. Example: should not start with a number; must be valid for naming a Unix file.

in_moratorium_[moratorium-group].ksh

Running this script

  • Ensures the current expedition content is tagged for moratorium control.
  • Do review the script. Sometimes the EOX expedition number is cleared before running eox_gen_in_moratorium_script.ksh.

Upon running the script--wait.

  • The NEW_RESULT table typically contains 20-30 million rows. This update takes the longest.
  • The NEW_TEST and NEW_SAMPLE table are much faster due to being 3 orders of magnitude smaller.
  • Timing is turned on--the scripts will show how long they took to run.

...


Code Block
titleimport_transfer.ksh
collapsetrue
EOX_EXP_DIR=`sqlplus -s / as sysdba << EOT
	set pages 0 feedback off
	SELECT directory_path FROM dba_directories WHERE directory_name = 'EOX_EXP_DIR';
	exit
EOT`
# DIRECTORY_PATH
# --------------------------------------------------------------------------------
# /backup/eox
if [ ! -d $EOX_EXP_DIR ]
then
	echo $EOX_EXP_DIR not found...
	exit 1
fi

if [ $# -ne 1 ]
then
	echo Enter dump file name
	read DUMP_FILE_NAME
else
	DUMP_FILE_NAME=$1
fi
if [ ! -r $EOX_EXP_DIR/$DUMP_FILE_NAME ]
then
	echo $EOX_EXP_DIR/$DUMP_FILE_NAME not found
	exit 2
fi

DT=`date +"%m%d%Y"`
SCHEMA_NAME=transfer
LOG_FILE_NAME=${SCHEMA_NAME}_import_${DT}.log
# Import LIMS, DESCINFO2, and OPS tables into target TRANSFER schema.
impdp system/$s directory=eox_exp_dir DUMPFILE=$DUMP_FILE_NAME LOGFILE=${LOG_FILE_NAME} VERSION=LATEST PARFILE=eox_impdp_to_transfer.par

while [ ${EOX}1 == "1" ]
do
  echo Enter EOX#
  read EOX
done
if [ ${EOX}1 == "1" ]
then
  echo EOX is not set.
  exit 1
fi

sql / as sysdba <<+++
drop user geodcat${EOX} cascade;
drop user geodp${EOX} cascade;
exit;
+++

# Import GEODCAT to an GEODCAT### where ### is the expedition being processed.
SCHEMA_NAME=geodcat${EOX}
LOG_FILE_NAME=${SCHEMA_NAME}_import_${DT}.log
impdp system/$s directory=eox_exp_dir DUMPFILE=$DUMP_FILE_NAME LOGFILE=${LOG_FILE_NAME} VERSION=LATEST CONTENT=ALL SCHEMAS=GEODCAT REMAP_SCHEMA=GEODCAT:${SCHEMA_NAME}

# Import GEODP### where ### is the expedition being processed.
SCHEMA_NAME=geodp${EOX}
LOG_FILE_NAME=${SCHEMA_NAME}_import_${DT}.log
impdp system/$s directory=eox_exp_dir DUMPFILE=$DUMP_FILE_NAME LOGFILE=${LOG_FILE_NAME} VERSION=LATEST CONTENT=ALL SCHEMAS=${SCHEMA_NAME}

# TBD--revise to log this output to a single file for reference after data modifications and copying.
sql / as sysdba <<+++
set newpage 0
set linesize 100
set pagesize 0
set timing on
set time on
set feedback on
set heading on
set trimspool on
set trimout on
set tab off

/* These indexes help speed up later bulk data manipulation and merge activites. */
drop index transfer.new_result_idx;
drop index transfer.new_result_result_number;
drop index transfer.new_test_idx;
drop index transfer.new_test_test_number;
drop index transfer.new_sample_sample_number;
create bitmap index transfer.new_result_idx on transfer.new_result ( sample_number, test_number, result_number ) tablespace transfer parallel;
create unique index transfer.new_result_result_number on transfer.new_result (result_number) tablespace transfer parallel;
create bitmap index transfer.new_test_idx on transfer.new_test ( sample_number, test_number ) tablespace transfer parallel;
create unique index transfer.new_test_test_number on transfer.new_test (test_number) tablespace transfer parallel;
create unique index transfer.new_sample_sample_number on transfer.new_sample (sample_number) tablespace transfer parallel;

/* Review the content loaded in TRANSFER in finer detail than the import activity. */
-- Review the sample, test, result content just loaded to TRANSFER
alter session set current_schema=transfer;
select 'REVIEW TRANSFER.NEW_SAMPLE' from dual;
select s.group_name, s.x_expedition, s.status, count(*) from transfer.new_sample s where s.x_expedition in ('${EOX}', 'QAQC', '999') group by s.group_name, s.x_expedition, s.status order by s.x_expedition, s.group_name, s.status;
select 'REVIEW TRANSFER.NEW_TEST' from dual;
select t.group_name, s.x_expedition, t.status, count(*) from transfer.new_sample s, transfer.new_test t where s.sample_number=t.sample_number and s.x_expedition in ('${EOX}', 'QAQC', '999') group by t.group_name, s.x_expedition, t.status order by s.x_expedition, t.group_name, t.status;
select 'REVIEW TRANSFER.NEW_RESULT' from dual;
select r.group_name, s.x_expedition, r.status, count(*) from transfer.new_sample s, transfer.new_result r where s.sample_number=r.sample_number and s.x_expedition in ('${EOX}', 'QAQC', '999') group by r.group_name, s.x_expedition, r.status order by s.x_expedition, r.group_name, r.status;

-- Compare to content in LIMS
alter session set current_schema=lims;
select 'REVIEW LIMS.SAMPLE' from dual;
select s.group_name, s.x_expedition, s.status, count(*) from lims.sample s where s.x_expedition in ('${EOX}', 'QAQC', '999') group by s.group_name, s.x_expedition, s.status order by s.x_expedition, s.group_name, s.status;
select 'REVIEW LIMS.TEST' from dual;
select t.group_name, s.x_expedition, t.status, count(*) from lims.sample s, lims.test t where s.sample_number=t.sample_number and s.x_expedition in ('${EOX}', 'QAQC', '999') group by t.group_name, s.x_expedition, t.status order by s.x_expedition, t.group_name, t.status;
select 'REVIEW LIMS.RESULT' from dual;
select r.group_name, s.x_expedition, r.status, count(*) from lims.sample s, lims.result r where s.sample_number=r.sample_number and s.x_expedition in ('${EOX}', 'QAQC', '999') group by r.group_name, s.x_expedition, r.status order by s.x_expedition, r.group_name, r.status;

-- Gather stats for the TRANSFER schema
exec dbms_stats.gather_schema_stats('TRANSFER', cascade=>TRUE);

-- Show info about tables imported from data pump export
select table_name, num_rows, avg_row_len, blocks, empty_blocks, owner from all_tables where owner='TRANSFER' order by owner, num_rows desc;

exit
+++

and its companion

Code Block
titleeox_impdp_to_transfer.par
collapsetrue
CONTENT=DATA_ONLY
TABLE_EXISTS_ACTION=TRUNCATE

# Fully load all tables required.
# Complex filtering was problematic, so handling it post import.
# Nota bene--DESCINFO2 tables on ship will be retired within 2023.


# LIMS
TABLES=LIMS.FILES, LIMS.SAMPLE, LIMS.TEST, LIMS.RESULT, LIMS.X_EXTERNAL_KEYS, LIMS.X_SCALE, LIMS.X_HOLE_TO_SCALE, LIMS.X_SAMPLE_DEPTH, LIMS.SCORS_AFFINE, LIMS.SCORS_AFFINE_DETAIL, LIMS.SCORS_SPLICE, LIMS.SCORS_SPLICE_DETAIL, LIMS.X_LIME_AUDIT_MAIN, LIMS.X_LIME_AUDIT_DETAIL, LIMS.GEOD_METADATA, LIMS.GEOD_DATA

REMAP_SCHEMA=LIMS:TRANSFER

REMAP_TABLE=LIMS.FILES:NEW_FILES, LIMS.SAMPLE:NEW_SAMPLE, LIMS.TEST:NEW_TEST, LIMS.RESULT:NEW_RESULT, LIMS.X_EXTERNAL_KEYS:NEW_X_EXTERNAL_KEYS, LIMS.X_SCALE:NEW_X_SCALE, LIMS.X_HOLE_TO_SCALE:NEW_X_HOLE_TO_SCALE, LIMS.X_SAMPLE_DEPTH:NEW_X_SAMPLE_DEPTH, LIMS.SCORS_AFFINE:NEW_SCORS_AFFINE, LIMS.SCORS_AFFINE_DETAIL:NEW_SCORS_AFFINE_DETAIL, LIMS.SCORS_SPLICE:NEW_SCORS_SPLICE, LIMS.SCORS_SPLICE_DETAIL:NEW_SCORS_SPLICE_DETAIL, LIMS.X_LIME_AUDIT_MAIN:NEW_X_LIME_AUDIT_MAIN, LIMS.X_LIME_AUDIT_DETAIL:NEW_X_LIME_AUDIT_DETAIL, LIMS.GEOD_METADATA:NEW_GEOD_METADATA, LIMS.GEOD_DATA:NEW_GEOD_DATA


# DESCINFO2
# DEPRECATED--The need for this segment will go away within the year. DESCINFO2 tables on ship will be retired within 2023.
TABLES=DESCINFO2.ATTRIBUTES, DESCINFO2.FONTS, DESCINFO2.GROUPS, DESCINFO2.OBSERVABLES, DESCINFO2.PARAMS, DESCINFO2.PATTERNS, DESCINFO2.SOFTWARECONFIG, DESCINFO2.SUBSYSTEMS, DESCINFO2.SYMBOLS, DESCINFO2.VALUELISTS

REMAP_SCHEMA=DESCINFO2:TRANSFER

REMAP_TABLE=DESCINFO2.ATTRIBUTES:NEW_ATTRIBUTES, DESCINFO2.FONTS:NEW_FONTS, DESCINFO2.GROUPS:NEW_GROUPS, DESCINFO2.OBSERVABLES:NEW_OBSERVABLES, DESCINFO2.PARAMS:NEW_PARAMS, DESCINFO2.PATTERNS:NEW_PATTERNS, DESCINFO2.SOFTWARECONFIG:NEW_SOFTWARECONFIG, DESCINFO2.SUBSYSTEMS:NEW_SUBSYSTEMS, DESCINFO2.SYMBOLS:NEW_SYMBOLS, DESCINFO2.VALUELISTS:NEW_VALUELISTS


# OPS
#TABLES=OPS.RIS_DATA

#REMAP_SCHEMA=OPS:TRANSFER

#REMAP_TABLE=OPS.RIS_DATA:NEW_RIS_DATA


3. Establish moratorium credentials and controls

For the new expedition data. - DBA

Pick a name and credential. The account is shared by a science party for the moratorium period. The role created with the same name ensures that science party is granted the exception to see its moratorium data.

eox_gen_in_moratorium_script.ksh [moratorium-group]

This script accepts a moratorium name, e.g. SHACKLETON, SANTORINI, etc.

It generates another script (below) that will be run to modify NEW_TEST, NEW_RESULT, and NEW_GEOD_DATA tables for moratorium concerns.

  • If a moratorium group is not provided, the script will prompt.
  • The script does not check these requirement
    • The GROUP_NAME field that this string populates is varchar2(10)--10 characters or less. It is conventional to use a memorable uppercase name. Often the EPMs will provide both a name and a credential.
    • The string must also be a valid name for a ROLE. Example: should not start with a number; must be valid for naming a Unix file.


Code Block
titleeox_gen_in_moratorium_script.ksh
collapsetrue
DT=`date +"%Y%m%d_%H%M%S"`
LOG1=upd1_result_grp_name_${DT}.txt
LOG2=upd2_result_grp_name_${DT}.txt
echo "Started at `date`" 
if [ $# -ne 1 ]
then
	echo Enter GRP_NM
	read GRP_NM
else
	GRP_NM=$1
fi
KSH_SCRIPT=in_moratorium_${GRP_NM}.ksh
echo "sql / as sysdba <<+++" >${KSH_SCRIPT}
echo "alter session set current_schema=transfer;" >>${KSH_SCRIPT}
echo "set echo on" >>${KSH_SCRIPT} 
echo "set pagesize 0" >>${KSH_SCRIPT} 
echo "set linesize 132" >>${KSH_SCRIPT} 
echo "set timing on" >>${KSH_SCRIPT} 
echo "set heading off" >>${KSH_SCRIPT} 
echo "spool in_moratorium_${GRP_NM}.log" >>${KSH_SCRIPT} 
echo "update transfer.new_result set group_name='${GRP_NM}' where group_name='DEFAULT' and analysis not in ('BHA','BIT','DEPLETED','DRILLING','LATLONG','ORIGDEPTH','ROUTING','SCALINPUTS') and sample_number in (select sample_number from transfer.new_sample where x_expedition='${EOX}');" >>${KSH_SCRIPT}
echo "commit;" >>${KSH_SCRIPT}
echo "update transfer.new_test set group_name='${GRP_NM}' where group_name='DEFAULT' and analysis not in ('BHA','BIT','DEPLETED','DRILLING','LATLONG','ORIGDEPTH','ROUTING','SCALINPUTS') and sample_number in (select sample_number from transfer.new_sample where x_expedition='${EOX}');" >>${KSH_SCRIPT}
echo "commit;" >>${KSH_SCRIPT}
echo "update transfer.new_result set group_name='DEFAULT' where rowid in (select t.rowid from transfer.new_sample s, transfer.new_result t where s.sample_number=t.sample_number and s.x_expedition in ('QAQC','999'));" >>${KSH_SCRIPT}
echo "commit;" >>${KSH_SCRIPT}
echo "update transfer.new_test set group_name='DEFAULT' where rowid in (select t.rowid from transfer.new_sample s, transfer.new_test t where s.sample_number=t.sample_number and s.x_expedition in ('QAQC','999'));" >>${KSH_SCRIPT}
echo "commit;" >>${KSH_SCRIPT}
echo "select 'REVIEW CHANGED ROWS' from dual;" >>${KSH_SCRIPT}
echo "select count(*), r.group_name, s.x_expedition from transfer.new_sample s, transfer.new_result r where s.sample_number=r.sample_number group by r.group_name, s.x_expedition;" >>${KSH_SCRIPT}
echo "select count(*), t.group_name, s.x_expedition from transfer.new_sample s, transfer.new_test t where s.sample_number=t.sample_number group by t.group_name, s.x_expedition;" >>${KSH_SCRIPT}
echo "exit;" >>${KSH_SCRIPT} 
echo "+++" >>${KSH_SCRIPT} 
chmod 777 ${KSH_SCRIPT}
echo "Completed at `date`"


in_moratorium_[moratorium-group].ksh

Running this script

  • Ensures the current expedition content is tagged for moratorium control.
  • Do review the script. Sometimes the EOX expedition number is cleared before running eox_gen_in_moratorium_script.ksh.

Upon running the script--wait.

  • The NEW_RESULT table typically contains 20-30 million rows. This update takes the longest.
  • The NEW_TEST and NEW_SAMPLE table are much faster due to being 3 orders of magnitude smaller.
  • Timing is turned on--the scripts will show how long they took to run.

The result table took 2-5 minutes to complete (with indexes in place) for each of ORTLIMS, LIMS19, RTLIMS, and LIMSHQ.

Code Block
titlein_moratorium_GROUPNAME.ksh
collapsetrue
sql -s / as sysdba <<+++
alter session set current_schema=transfer;
set echo on
set pagesize 0
set linesize 132
set timing on
set heading off
spool in_moratorium_GROUPNAME.log
update transfer.new_result set group_name='GROUPNAME' where group_name='DEFAULT' and analysis not in ('BHA','BIT','DEPLETED','DRILLING','LATLONG','ORIGDEPTH','ROUTING','SCALINPUTS') and sample_number in (select sample_number from transfer.new_sample where x_expedition='###');
commit;
update transfer.new_test set group_name='GROUPNAME' where group_name='DEFAULT' and analysis not in ('BHA','BIT','DEPLETED','DRILLING','LATLONG','ORIGDEPTH','ROUTING','SCALINPUTS') and sample_number in (select sample_number from transfer.new_sample where x_expedition='###');
commit;
update transfer.new_result set group_name='DEFAULT' where rowid in (select t.rowid from transfer.new_sample s, transfer.new_result t where s.sample_number=t.sample_number and s.x_expedition in ('QAQC','999'));
commit;
update transfer.new_test set group_name='DEFAULT' where rowid in (select t.rowid from transfer.new_sample s, transfer.new_test t where s.sample_number=t.sample_number and s.x_expedition in ('QAQC','999'));
commit;
select 'REVIEW CHANGED ROWS' from dual;
select count(*), r.group_name, s.x_expedition from transfer.new_sample s, transfer.new_result r where s.sample_number=r.sample_number group by r.group_name, s.x_expedition;
select count(*), t.group_name, s.x_expedition from transfer.new_sample s, transfer.new_test t where s.sample_number=t.sample_number group by t.group_name, s.x_expedition;
exit;
+++


crt_user_exp.sql

Review the script.

...