# BSA Data Extraction

LCLS-Live provides simple functions to extract beam synchronous acquisition (BSA) data from archived HDF5 files. These files are stored on SLAC systems, and the user must have access to these to use these functions.


See the documentation: LCLS BEAM SYNCHRONOUS DATASTORE USER GUIDE at:
    https://www.slac.stanford.edu/grp/ad/docs/model/matlab/bsd.html


In [1]:
%load_ext autoreload
%autoreload 2

## BSA snapshots

This is the basic high-level function.

In [2]:
from lcls_live.bsa import bsa_snapshot

In [3]:
?bsa_snapshot

[0;31mSignature:[0m [0mbsa_snapshot[0m[0;34m([0m[0mtimestamp[0m[0;34m,[0m [0mbeampath[0m[0;34m,[0m [0mpvnames[0m[0;34m=[0m[0;32mNone[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Extract as a snapshot (PV values) nearest a timestamp from a BSA HDF5 file.

Parameters
----------
h5file: str
    BSA HDF5 file with data that includes the timestamp
    
timestamp: pd.DateTime or datetime.datetime
    This must be localized (not naive time)
    
pvnames : list or None
    List of PV names to extract. If None, all PVs in the source file will be extracted.
    Optional, default=None
    
Returns
-------
snapshot: dict
    Dict with:
        'pvdata' : dict of {pv name:pv value}
        'timestamp' : pd.Timestamp, including the nanosecond.
        'source' : Original HDF5 file that the data came from.

Examples
--------
>>>bsa_snapshot('2021-11-11T00:00:00-08:00', 'cu_hxr')
[0;31mFile:[0m      ~/GitHub/lcls-live/lcls_live/bsa.py
[0;31mType:[0m      funct

In [4]:
%%time
snapshot = bsa_snapshot('2021-12-11T00:00:00-08:00', 'cu_hxr')

snapshot.keys()

CPU times: user 439 ms, sys: 38.6 ms, total: 478 ms
Wall time: 1.28 s


dict_keys(['pvdata', 'timestamp', 'source'])

In [5]:
# The data is a simple dict
pvdata = snapshot['pvdata']
len(pvdata)

1091

In [6]:
# Here are a few keys in the dict
list(pvdata)[0:10]

['ACCL_IN20_300_L0A_A',
 'ACCL_IN20_300_L0A_P',
 'ACCL_IN20_400_L0B_A',
 'ACCL_IN20_400_L0B_P',
 'ACCL_LI21_180_L1X_A',
 'ACCL_LI21_180_L1X_P',
 'ACCL_LI21_1_L1S_A',
 'ACCL_LI21_1_L1S_P',
 'BLD_SYS0_500_ANG_X',
 'BLD_SYS0_500_ANG_Y']

In [7]:
# And some values
for k in list(pvdata)[0:10]:
    print(k, pvdata[k])

ACCL_IN20_300_L0A_A 57.99538201588009
ACCL_IN20_300_L0A_P -0.00917495265120749
ACCL_IN20_400_L0B_A 69.47708616061887
ACCL_IN20_400_L0B_P -2.564164251349297
ACCL_LI21_180_L1X_A 21.016761493860674
ACCL_LI21_180_L1X_P -160.00175793392177
ACCL_LI21_1_L1S_A 111.53502637024258
ACCL_LI21_1_L1S_P -22.394640079900164
BLD_SYS0_500_ANG_X -0.03628770291941744
BLD_SYS0_500_ANG_Y -0.0050121151142197545


In [8]:
# This is the exact time the data is at
snapshot['timestamp']

Timestamp('2021-12-11 08:00:00.003286466+0000', tz='UTC')

In [9]:
# And the original HDF5 source file
snapshot['source']

'/gpfs/slac/staas/fs1/g/bsd/BSAService/data/2021/12/11/CU_HXR_20211211_080825.h5'

In [10]:
# Note that some values are nan
pvdata['BLM_UNDH_0235_QDCRAW']    

array(nan)

In [11]:
# Adding a list pv names to be extracted. Note that any PV not present is simply returned as None
bsa_snapshot('2021-12-11T00:00:00-08:00', 'cu_hxr', 
             pvnames = ['ACCL_IN20_300_L0A_A', 'ACCL_IN20_300_L0A_P', 'dummy'])

{'pvdata': {'ACCL_IN20_300_L0A_A': array(57.99538202),
  'ACCL_IN20_300_L0A_P': array(-0.00917495),
  'dummy': None},
 'timestamp': Timestamp('2021-12-11 08:00:00.003286466+0000', tz='UTC'),
 'source': '/gpfs/slac/staas/fs1/g/bsd/BSAService/data/2021/12/11/CU_HXR_20211211_080825.h5'}

## Notes on timestamps

Timestamps here must have localization information (i.e. the time zone). Otherwise it is ambiguous what time to extract. The internal data files and directories are named and described in UTC time only.

See: https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html


In [12]:
# The timestamp must be localized, so this will fail:
try:
    bsa_snapshot('2021-12-11T00:00:00', 'cu_hxr')
except Exception as ex:
    print(ex)

Cannot convert tz-naive Timestamp, use tz_localize to localize


In [13]:
import datetime
# This is not localized:
datetime.datetime(2021, 12, 1, 17, 7, 49)

datetime.datetime(2021, 12, 1, 17, 7, 49)

In [14]:
# but this is:
dtime = datetime.datetime(2021, 12, 1, 17, 7, 49, tzinfo=datetime.timezone.utc)
dtime

datetime.datetime(2021, 12, 1, 17, 7, 49, tzinfo=datetime.timezone.utc)

In [15]:
# And will work with bsa_snapshot
bsa_snapshot(dtime, 'cu_hxr')['timestamp']

Timestamp('2021-12-01 17:07:49.002202872+0000', tz='UTC')

## Helper functions

In [16]:
from lcls_live.bsa import bsa_h5file, BSA_DATA_SEARCH_PATHS

In [17]:
# These are the pahs searched.
BSA_DATA_SEARCH_PATHS

['/gpfs/slac/staas/fs1/g/bsd/BSAService/data/',
 '/nfs/slac/g/bsd/BSAService/data/']

In [18]:
# Find the appropriate file
bsa_h5file('2021-12-11T00:00:00-08:00', 'cu_hxr')

'/gpfs/slac/staas/fs1/g/bsd/BSAService/data/2021/12/11/CU_HXR_20211211_080825.h5'

In [19]:
?bsa_h5file

[0;31mSignature:[0m [0mbsa_h5file[0m[0;34m([0m[0mtimestamp[0m[0;34m,[0m [0mbeampath[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Finds the BSA HDF5 file that contains the timestamp for a given beampath

BSA data files are named as:
    CU_SXR_20211210_140742.h5
    
Which corresponds to '{beampath}_{time_str}.h5' with time_str in the format: '%Y%m%d_%H%M%S'
    
See the documentation in:
    https://www.slac.stanford.edu/grp/ad/docs/model/matlab/bsd.html
     "The data files are named with the UTC datestamp of the END of their data taking period"
     
Parameters
----------

timestamp: pd.DateTime or datetime.datetime
    This must be localized (not naive time)

beampath : str
        one of ['cu_hxr', 'cu_sxr'] (case independent)
    
Returns
-------
h5file : str
    Full path to the HDF5 file that should contain the time. 
        
        
Examples
--------

>>> bsa_h5file('2021-12-11T00:00:00-08:00', 'cu_hxr')
'/gpfs/slac/staas/fs1/g/bsd/BSAService/dat