EBF, which stands for Efficient and Easy to use Binary Format, is a binary file format for reading and writing binary data easily. Reading writing routines are currently available in C,C++,Fortran,Java, Python, IDL, MATLAB. A program called ebftkpy which has a set of utility functions to work with the .ebf files , e.g., viewing the contents and getting a summary, is also provided.
The EBF specification is designed to be concise and easy to understand to make it easier for others to write their own code if needed. It is also designed to simplify the programming of input output routines in different programming languages. In a nutshell an EBF file is a collection of data objects. Each data object is specified by a unique name and a single file can have multiple data objects. Each data object is preceded by a meta-data or header which describes the binary data associated with it. Among other things, this header allows the files to be portable across systems with different endianess. In EBF binary data is always written in native endian format (either little or big), and if needed byte swapping is done while reading. This makes writing data fast and eliminates the need for unnecessary swapping when reading and writing is done on the same architecture. EBF uses the row major format to specify the multidimensional arrays, which is similar to C/C++. Note, Fortran, IDL and Matlab use column major format.
An ebf file is a collection of data objects. Each data object has two parts
EBF has an in built hash table for locating items. To access these tables the ebf library makes use of two Data objects
- “/.ebf/info”
- “/.ebf/htable”.
These are automatically created by the ebf library.
Each data type has an integer code associated with it. The table below lists the supported data types and their integer codes.
Data type | code Data | Type Datasize (bytes) |
---|---|---|
0 | undefined | 0 |
1 | char | 1 |
2 | int32 | 4 |
3 | int64 | 8 |
4 | float32 | 4 |
5 | float64 | 8 |
6 | int16 | 2 |
7 | float16 | 2 |
8 | structure | Implementation defined |
9 | int8 | 1 |
10 | uint8 | 1 |
11 | uint16 | 2 |
12 | uint32 | 4 |
13 | uint64 | 8 |
ebftk is a python script, which contains a set of utility functions to manipulate the EBF formatted files. In EBF files each data has a unique tag-name which begins with “/” e.g., “/FeH” , “/Pos3” etc. Data can be queried using tag-names. ebftk can be used to get information about the data (TagNames) in the file, to print the data in ascii or csv format using TagNames, renaming items, removing items and so on.
ebftk and ebfconvert script is installed automatically when the python ebf module is installed. Using setup.py the python ebf module will be installed in a standard location but where the ebftkpy script is installed can be controlled by specifying –install_scripts=mypath. The mypath can be /usr/local/bin/ or somewhere in your home directory where you keep your programs e.g., /home/user/sw/bin/. With –user option generally the scripts are installed in ~/.local/bin/. If mypath is not in your search path then you need to add it. Note, if python numpy module is not installed (although it is quite standard to have it) you will also have to install it.
$pip install ebfpy OR
$pip install ebfpy --user OR
Alternatively
$tar -zxvf ebfpy_x.x.x.tar.gz
$cd ebfpy_x.x.x
$python setup.py install --user OR
$python setup.py install --user --install-scripts=mypath OR
$python setup.py install --install-scripts=mypath
If mypath is not already in your search path then you can add it as follows.
For bash shell: in ~/.bashrc file add
export PATH=$PATH:/home/username/mypath
For tcsh or csh: in ~/.chsrc file add
set PATH = ($PATH /home/username/mypath)
NAME:
>>EBF<< (Efficient and Easy to use Binary File Format)
ebftkpy 0.0.1 - a toolkit for EBF files
Copyright (c) 2012 Sanjib Sharma
USAGE:
ebftk -list filename
ebftk filename (same as -list)
ebftk -cat filename "TagName1 TagName2 .."
ebftk -csv filename "TagName1 TagName2 .."
ebftk -ssv filename "TagName1 TagName2 .."
ebftk -stat filename "TagName1 TagName2 .."
ebftk -swap filename
ebftk -copy src_file dest_file
ebftk -copy src_file dest_file TagName
ebftk -diff filename1 filename2
ebftk -rename filename1 tagname_old tagname_new
ebftk -remove filename1 tagname
ebftk -htab filename
DESCRIPTION:
-list view headers/TagNames of data in file
-cat print data in ascii format
e.g., for "TagName1" a record of rank 2 with
dimensions N and 3 will print a Nx3 table,
for "TagName2" a record of rank 1 with dimension N
will print a column of size N
multiple tags can be specified as space separated
strings as "TagName1 TagName2"
but the condition is that the number of elements in
each record should be same. This will print a Nx4 table
-csv print data in csv tabular format, syntax same as cat
-ssv print data in csv tabular format, but delimitier as space
-stat print min max mean stddev of specified data tags
-swap swap the endianness of a file, output file has
suffix _swap.ebf
-copy copy contents of one file to another or only a tag
-diff difference of two data items in two ebf files
-rename rename a data item
-remove remove a data item. It is renamed with prefix /.tr/
which can be restored using rename if needed
-htab get information about internal hashtable
CONTACT:
http://ebfformat.sourceforge.net or bugsanjib at gmail
An exmaple print out of command ebftkpy file.ebf is given below
------------------------------------------------------------------
name dtype endian unit dim
------------------------------------------------------------------
/log char little [1552]
/typelist int32 little [1]
/typelist1 int32 little [114]
/pos3 float32 little [163 3]
/vel3 float32 little [163 3]
ebfconvert is python script to convert ascii files to ebf files. Limited support for converting fits files is also available. It can also handle large files that are difficult to fit in memory.
NAME:
>>EBF<< (Efficient and Easy to use Binary File Format)
ebfconvert 0.0.2 - converts ascii and fits files to EBF format
Can also handle very large files.
Copyright (c) 2013 Sanjib Sharma
USAGE:
ebfconvert [OPTIONS] filename
ebfconvert [OPTIONS] "prefix*suffix"
wild card must be in double quotes
DESCRIPTION:
Outfile name is constructed from filename with suffix replaced by .ebf
Files with suffix .fits .fit or .fts are assumed to be fits rest are
treated as ascii
Empty lines and lines beginning with # are ignored.
Following delimiters are allowed (tab,space, comma, semicolon).
First non commented line determines the delimiter.
Tab and spaces can be mixed but not others.
The program tries to guess datatype from data and chooses from
float64, int64 and string
Default column names are of form 'col'+str(i).
If items in first non-commented lines are all strings then they treated
as field names.
For explicit control over formatting before the start of data
following lines can be added
#fields =[name_1 , name_2, name_3, ...name_n]
#datatypes=[float32, int64, S , ...float64]
#units =[m/s , , kg , ...m*kg^{-2}]
for fixed format data one can specify widths of fields as
#widths =[5 , 4 , 10 , ...12]
instead of datatypes and widths one can also provide
#format ="%-8.2f%+3.2e %10i %#3u %04d"
This is c printf format %[flags][width][.precision]type
Note spaces between two % changes width of fields
Allowed formats codes (e, E, f, F, g, G, i, u, d, s)
format and width if present should not have blank members.
fields, datatypes, units can have blank members.
Allowed datatypes are (int8, int16, int32, in64, uint8, uint16,
uint32, uint64, float32, float64, S)
OPTIONS:
--join=join_file
When using wildcard this option joins multiple files into one.
All files to be joined should have same column names.
--schema=schema_file
This is an alternate way to explicity specify formatting
An example schema file is given below.Each field is separated
by comma. A field contains space separated entities
name datatype width unit. The width is for fixed width
format only, for other formats set it to 0.
(ra float32 0 degree,
dec float32 ,
velocity float32 0 )
-struct
will write data as numpy structure
-names
For fits file this will name data items using header.name.
Default is to name as du+str(i)
-attributes
For fits file this will parse the header and write keyword
value pairs as dataname+_attributes/. Default is to write
dataname_fitsheader