Introduction ------------ EBF, which stands for Efficient and Easy to use Binary Format, is a binary file format for reading and writing binary data easily. Reading writing routines are currently available in C,C++,Fortran,Java, Python, IDL, MATLAB. A program called ebftkpy which has a set of utility functions to work with the .ebf files , e.g., viewing the contents and getting a summary, is also provided. The EBF specification is designed to be concise and easy to understand to make it easier for others to write their own code if needed. It is also designed to simplify the programming of input output routines in different programming languages. In a nutshell an EBF file is a collection of data objects. Each data object is specified by a unique name and a single file can have multiple data objects. Each data object is preceded by a meta-data or header which describes the binary data associated with it. Among other things, this header allows the files to be portable across systems with different endianess. In EBF binary data is always written in native endian format (either little or big), and if needed byte swapping is done while reading. This makes writing data fast and eliminates the need for unnecessary swapping when reading and writing is done on the same architecture. EBF uses the row major format to specify the multidimensional arrays, which is similar to C/C++. Note, Fortran, IDL and Matlab use column major format. - Store multiple data items in one file, each having a unique tag name + tagnames follow the convention of unix style pathname e.g. /x or /mydata/x + this allows hierarchical storage of data - Constant time lookup of items due to use of an inbuilt hash table + one can store as many data items as one wants without any slow down - Automatic type and endian conversion - Support for mutiple programming languages + data can easily read in C, C++, Fortran, Java, IDL and Matlab + facilitates easy distribution of data - Structures supported in Python and IDL + Nested structures are also supported File Format ----------- An ebf file is a collection of data objects. Each data object has two parts * Header- this contains the metadata * Binary data- the actual data .. figure:: ebf_format.svg Hash table ---------------- EBF has an in built hash table for locating items. To access these tables the ebf library makes use of two Data objects * "/.ebf/info" * "/.ebf/htable". These are automatically created by the ebf library. Type Codes ----------- Each data type has an integer code associated with it. The table below lists the supported data types and their integer codes. ========= ========= ======================= Data type code Data Type Datasize (bytes) ========= ========= ======================= 0 undefined 0 1 char 1 2 int32 4 3 int64 8 4 float32 4 5 float64 8 6 int16 2 7 float16 2 8 structure Implementation defined 9 int8 1 10 uint8 1 11 uint16 2 12 uint32 4 13 uint64 8 ========= ========= ======================= Tag name Conventions --------------------- * Tag names ("/simulation1/px") are analogous to unix filenames and begin with "/" implying the root directory. This allows for hierarchical arrangement of data. * Tag names are case insensitive. * Tag name should be alphanumeric (A to Z, 0-9) and can include underscore character. First character should be alphabetic to allow compatibility across different programming languanges (e.g. IDL). * Tag names containing '.' (dot) character are reserved for speacial use and should be avoided. They are ignored when performing operations such as copying, diff and so on. * '/log' tagname is used for specifying the information as to how the data was created which can be written as a string of characters. * Although the format specification allows for an array of structures, but presently support for this is only available in python and idl. Such cases generally arise when writing tables where potentially each column can be of a different data type. A more user friendly way of writing a table is to write each column as a different data object. This on one hand allows selective retrieval of columns and on the other allows for new columns to be added to the data file. EBF toolkit *ebftk* (formerly *ebftkpy*) --------------------------------- *ebftk* is a python script, which contains a set of utility functions to manipulate the EBF formatted files. In EBF files each data has a unique tag-name which begins with "/" e.g., "/FeH" , "/Pos3" etc. Data can be queried using tag-names. ebftk can be used to get information about the data (TagNames) in the file, to print the data in ascii or csv format using TagNames, renaming items, removing items and so on. Installation ^^^^^^^^^^^^^ *ebftk* and *ebfconvert* script is installed automatically when the python ebf module is installed. Using setup.py the python ebf module will be installed in a standard location but where the *ebftkpy* script is installed can be controlled by specifying *--install_scripts=mypath*. The mypath can be */usr/local/bin/* or somewhere in your home directory where you keep your programs e.g., */home/user/sw/bin/*. With *--user* option generally the scripts are installed in *~/.local/bin/*. If mypath is not in your search path then you need to add it. Note, if python numpy module is not installed (although it is quite standard to have it) you will also have to install it. :: $pip install ebfpy OR $pip install ebfpy --user OR Alternatively :: $tar -zxvf ebfpy_x.x.x.tar.gz $cd ebfpy_x.x.x $python setup.py install --user OR $python setup.py install --user --install-scripts=mypath OR $python setup.py install --install-scripts=mypath If *mypath* is not already in your search path then you can add it as follows. :: For bash shell: in ~/.bashrc file add export PATH=$PATH:/home/username/mypath For tcsh or csh: in ~/.chsrc file add set PATH = ($PATH /home/username/mypath) Usage ^^^^^^ :: NAME: >>EBF<< (Efficient and Easy to use Binary File Format) ebftkpy 0.0.1 - a toolkit for EBF files Copyright (c) 2012 Sanjib Sharma USAGE: ebftk -list filename ebftk filename (same as -list) ebftk -cat filename "TagName1 TagName2 .." ebftk -csv filename "TagName1 TagName2 .." ebftk -ssv filename "TagName1 TagName2 .." ebftk -stat filename "TagName1 TagName2 .." ebftk -swap filename ebftk -copy src_file dest_file ebftk -copy src_file dest_file TagName ebftk -diff filename1 filename2 ebftk -rename filename1 tagname_old tagname_new ebftk -remove filename1 tagname ebftk -htab filename DESCRIPTION: -list view headers/TagNames of data in file -cat print data in ascii format e.g., for "TagName1" a record of rank 2 with dimensions N and 3 will print a Nx3 table, for "TagName2" a record of rank 1 with dimension N will print a column of size N multiple tags can be specified as space separated strings as "TagName1 TagName2" but the condition is that the number of elements in each record should be same. This will print a Nx4 table -csv print data in csv tabular format, syntax same as cat -ssv print data in csv tabular format, but delimitier as space -stat print min max mean stddev of specified data tags -swap swap the endianness of a file, output file has suffix _swap.ebf -copy copy contents of one file to another or only a tag -diff difference of two data items in two ebf files -rename rename a data item -remove remove a data item. It is renamed with prefix /.tr/ which can be restored using rename if needed -htab get information about internal hashtable CONTACT: http://ebfformat.sourceforge.net or bugsanjib at gmail An exmaple print out of command ebftkpy file.ebf is given below ------------------------------------------------------------------ name dtype endian unit dim ------------------------------------------------------------------ /log char little [1552] /typelist int32 little [1] /typelist1 int32 little [114] /pos3 float32 little [163 3] /vel3 float32 little [163 3] Converting ASCII files to EBF *ebfconvert* ------------------------------------------ *ebfconvert* is python script to convert ascii files to ebf files. Limited support for converting fits files is also available. It can also handle large files that are difficult to fit in memory. Usage ^^^^^^ :: NAME: >>EBF<< (Efficient and Easy to use Binary File Format) ebfconvert 0.0.2 - converts ascii and fits files to EBF format Can also handle very large files. Copyright (c) 2013 Sanjib Sharma USAGE: ebfconvert [OPTIONS] filename ebfconvert [OPTIONS] "prefix*suffix" wild card must be in double quotes DESCRIPTION: Outfile name is constructed from filename with suffix replaced by .ebf Files with suffix .fits .fit or .fts are assumed to be fits rest are treated as ascii Empty lines and lines beginning with # are ignored. Following delimiters are allowed (tab,space, comma, semicolon). First non commented line determines the delimiter. Tab and spaces can be mixed but not others. The program tries to guess datatype from data and chooses from float64, int64 and string Default column names are of form 'col'+str(i). If items in first non-commented lines are all strings then they treated as field names. For explicit control over formatting before the start of data following lines can be added #fields =[name_1 , name_2, name_3, ...name_n] #datatypes=[float32, int64, S , ...float64] #units =[m/s , , kg , ...m*kg^{-2}] for fixed format data one can specify widths of fields as #widths =[5 , 4 , 10 , ...12] instead of datatypes and widths one can also provide #format ="%-8.2f%+3.2e %10i %#3u %04d" This is c printf format %[flags][width][.precision]type Note spaces between two % changes width of fields Allowed formats codes (e, E, f, F, g, G, i, u, d, s) format and width if present should not have blank members. fields, datatypes, units can have blank members. Allowed datatypes are (int8, int16, int32, in64, uint8, uint16, uint32, uint64, float32, float64, S) OPTIONS: --join=join_file When using wildcard this option joins multiple files into one. All files to be joined should have same column names. --schema=schema_file This is an alternate way to explicity specify formatting An example schema file is given below.Each field is separated by comma. A field contains space separated entities name datatype width unit. The width is for fixed width format only, for other formats set it to 0. (ra float32 0 degree, dec float32 , velocity float32 0 ) -struct will write data as numpy structure -names For fits file this will name data items using header.name. Default is to name as du+str(i) -attributes For fits file this will parse the header and write keyword value pairs as dataname+_attributes/. Default is to write dataname_fitsheader