Introduction

EBF, which stands for Efficient and Easy to use Binary Format, is a binary file format for reading and writing binary data easily. Reading writing routines are currently available in C,C++,Fortran,Java, Python, IDL, MATLAB. A program called ebftkpy which has a set of utility functions to work with the .ebf files , e.g., viewing the contents and getting a summary, is also provided.

The EBF specification is designed to be concise and easy to understand to make it easier for others to write their own code if needed. It is also designed to simplify the programming of input output routines in different programming languages. In a nutshell an EBF file is a collection of data objects. Each data object is specified by a unique name and a single file can have multiple data objects. Each data object is preceded by a meta-data or header which describes the binary data associated with it. Among other things, this header allows the files to be portable across systems with different endianess. In EBF binary data is always written in native endian format (either little or big), and if needed byte swapping is done while reading. This makes writing data fast and eliminates the need for unnecessary swapping when reading and writing is done on the same architecture. EBF uses the row major format to specify the multidimensional arrays, which is similar to C/C++. Note, Fortran, IDL and Matlab use column major format.

  • Store multiple data items in one file, each having a unique tag name + tagnames follow the convention of unix style pathname e.g. /x or /mydata/x + this allows hierarchical storage of data
  • Constant time lookup of items due to use of an inbuilt hash table + one can store as many data items as one wants without any slow down
  • Automatic type and endian conversion
  • Support for mutiple programming languages + data can easily read in C, C++, Fortran, Java, IDL and Matlab + facilitates easy distribution of data
  • Structures supported in Python and IDL + Nested structures are also supported

File Format

An ebf file is a collection of data objects. Each data object has two parts

  • Header- this contains the metadata
  • Binary data- the actual data

Hash table

EBF has an in built hash table for locating items. To access these tables the ebf library makes use of two Data objects

  • “/.ebf/info”
  • “/.ebf/htable”.

These are automatically created by the ebf library.

Type Codes

Each data type has an integer code associated with it. The table below lists the supported data types and their integer codes.

Data type code Data Type Datasize (bytes)
0 undefined 0
1 char 1
2 int32 4
3 int64 8
4 float32 4
5 float64 8
6 int16 2
7 float16 2
8 structure Implementation defined
9 int8 1
10 uint8 1
11 uint16 2
12 uint32 4
13 uint64 8

Tag name Conventions

  • Tag names (“/simulation1/px”) are analogous to unix filenames and begin with “/” implying the root directory. This allows for hierarchical arrangement of data.
  • Tag names are case insensitive.
  • Tag name should be alphanumeric (A to Z, 0-9) and can include underscore character. First character should be alphabetic to allow compatibility across different programming languanges (e.g. IDL).
  • Tag names containing ‘.’ (dot) character are reserved for speacial use and should be avoided. They are ignored when performing operations such as copying, diff and so on.
  • ‘/log’ tagname is used for specifying the information as to how the data was created which can be written as a string of characters.
  • Although the format specification allows for an array of structures, but presently support for this is only available in python and idl. Such cases generally arise when writing tables where potentially each column can be of a different data type. A more user friendly way of writing a table is to write each column as a different data object. This on one hand allows selective retrieval of columns and on the other allows for new columns to be added to the data file.

EBF toolkit ebftk (formerly ebftkpy)

ebftk is a python script, which contains a set of utility functions to manipulate the EBF formatted files. In EBF files each data has a unique tag-name which begins with “/” e.g., “/FeH” , “/Pos3” etc. Data can be queried using tag-names. ebftk can be used to get information about the data (TagNames) in the file, to print the data in ascii or csv format using TagNames, renaming items, removing items and so on.

Installation

ebftk and ebfconvert script is installed automatically when the python ebf module is installed. Using setup.py the python ebf module will be installed in a standard location but where the ebftkpy script is installed can be controlled by specifying –install_scripts=mypath. The mypath can be /usr/local/bin/ or somewhere in your home directory where you keep your programs e.g., /home/user/sw/bin/. With –user option generally the scripts are installed in ~/.local/bin/. If mypath is not in your search path then you need to add it. Note, if python numpy module is not installed (although it is quite standard to have it) you will also have to install it.

$pip install ebfpy           OR
$pip install ebfpy --user    OR

Alternatively

$tar -zxvf ebfpy_x.x.x.tar.gz
$cd ebfpy_x.x.x
$python setup.py install --user                            OR
$python setup.py install --user --install-scripts=mypath   OR
$python setup.py install  --install-scripts=mypath

If mypath is not already in your search path then you can add it as follows.

For bash shell:  in ~/.bashrc file add
export PATH=$PATH:/home/username/mypath

For tcsh or csh: in ~/.chsrc file add
set PATH = ($PATH /home/username/mypath)

Usage

NAME:
        >>EBF<<  (Efficient and Easy to use Binary File Format)
        ebftkpy 0.0.1 - a toolkit for  EBF  files
        Copyright (c) 2012 Sanjib Sharma
USAGE:
        ebftk   -list filename
        ebftk    filename  (same as -list)
        ebftk   -cat filename "TagName1 TagName2 .."
        ebftk   -csv filename "TagName1 TagName2 .."
        ebftk   -ssv filename "TagName1 TagName2 .."
        ebftk   -stat filename "TagName1 TagName2 .."
        ebftk   -swap filename
        ebftk   -copy src_file dest_file
        ebftk   -copy src_file dest_file TagName
        ebftk   -diff  filename1 filename2
        ebftk   -rename  filename1 tagname_old tagname_new
        ebftk   -remove  filename1 tagname
        ebftk   -htab filename
DESCRIPTION:
        -list     view headers/TagNames of data in file
        -cat      print data in ascii format
                  e.g., for "TagName1" a record of rank 2 with
                  dimensions N and 3 will print a Nx3 table,
                  for "TagName2" a record of rank 1 with dimension N
                  will print a column of size N
                  multiple tags can be specified as space separated
                  strings as "TagName1 TagName2"
                  but the condition is that the number of elements in
                  each record should be same. This will print a Nx4 table
        -csv      print data in csv tabular format, syntax same as cat
        -ssv      print data in csv tabular format, but delimitier as space
        -stat     print min max mean stddev of specified data tags
        -swap     swap the endianness of a file, output file has
                  suffix _swap.ebf
        -copy     copy contents of one file to another or only a tag
        -diff     difference of two data items in two ebf files
        -rename   rename a data item
        -remove   remove a data item. It is renamed with prefix /.tr/
                  which can be restored using rename if needed
        -htab     get information about internal hashtable
CONTACT:
http://ebfformat.sourceforge.net or bugsanjib at gmail


An exmaple print out of command ebftkpy file.ebf is given below
------------------------------------------------------------------
name                           dtype    endian  unit       dim
------------------------------------------------------------------
/log                           char     little             [1552]
/typelist                      int32    little             [1]
/typelist1                     int32    little             [114]
/pos3                          float32  little             [163   3]
/vel3                          float32  little             [163   3]

Converting ASCII files to EBF ebfconvert

ebfconvert is python script to convert ascii files to ebf files. Limited support for converting fits files is also available. It can also handle large files that are difficult to fit in memory.

Usage

NAME:
        >>EBF<<  (Efficient and Easy to use Binary File Format)
        ebfconvert 0.0.2 - converts ascii and fits files to  EBF format
        Can also handle very large files.
        Copyright (c) 2013 Sanjib Sharma
USAGE:
        ebfconvert [OPTIONS] filename
        ebfconvert [OPTIONS] "prefix*suffix"
        wild card must be in double quotes
DESCRIPTION:
        Outfile name is constructed from filename with suffix replaced by .ebf

        Files with suffix .fits .fit or .fts are assumed to be fits rest are
         treated as ascii

        Empty lines and lines beginning with # are ignored.

        Following delimiters are allowed (tab,space, comma, semicolon).
        First non commented line determines the delimiter.

        Tab and spaces can be mixed but not others.

        The program tries to guess datatype  from data and chooses from
        float64, int64 and string

        Default column names are of form 'col'+str(i).

        If items in first non-commented lines are all strings then they treated
        as field names.

        For explicit control over formatting before the start of data
        following lines can be added

        #fields   =[name_1 , name_2, name_3, ...name_n]
        #datatypes=[float32, int64, S      , ...float64]
        #units    =[m/s    ,      , kg     , ...m*kg^{-2}]

        for fixed format data one can specify widths of fields as
        #widths   =[5      , 4    , 10     , ...12]

        instead of datatypes and widths one can also provide
        #format   ="%-8.2f%+3.2e %10i %#3u %04d"
        This is c printf format %[flags][width][.precision]type
        Note spaces between two % changes width of fields
        Allowed formats codes (e, E, f, F, g, G, i, u, d, s)

        format and width if present should not have blank members.
        fields, datatypes, units can have blank members.
        Allowed datatypes are (int8, int16, int32, in64, uint8, uint16,
        uint32, uint64, float32, float64, S)

OPTIONS:
        --join=join_file
                When using wildcard this option joins multiple files into one.
                All files to be joined should have same column names.
        --schema=schema_file
                This is an alternate way to explicity specify formatting
                An example schema file is given below.Each field is separated
                by comma. A field contains space separated entities
                name datatype width unit. The width is for fixed width
                format only, for other formats set it to 0.

                (ra       float32 0 degree,
                dec      float32         ,
                velocity float32 0       )
        -struct
                will write data as numpy structure
        -names
                For fits file this will name data items using header.name.
                Default is to name as du+str(i)
        -attributes
                For fits file this will parse the header and write keyword
                value pairs as dataname+_attributes/. Default is to write
                dataname_fitsheader