Introduction
------------

EBF, which stands for Efficient and Easy to use Binary Format, is a binary 
file format for reading and writing binary data easily. Reading writing routines are currently 
available in C,C++,Fortran,Java, Python, IDL, MATLAB.  A program called  ebftkpy 
which has a set of utility  functions  to work with the .ebf files , e.g., 
viewing the contents and getting a summary, is also provided.  

The EBF specification is designed to be concise and easy to
understand to make it easier for others to write their own code if needed. 
It is also designed to
simplify the programming of input output routines in different 
programming languages.  In a nutshell an EBF file is a 
collection of data objects. Each data object is specified by a unique name and a single file can 
have multiple data objects. 
Each data object is preceded by a meta-data or header which 
describes the binary data associated with it.  Among other things, this header allows the 
files to be portable across systems with different endianess. In EBF binary data is 
always written in native endian format (either little or big), and if needed byte 
swapping is done while reading. This makes writing data fast and eliminates the need 
for unnecessary swapping when reading and writing is done on the same architecture. 
EBF uses the row major format to specify the multidimensional arrays, which is 
similar to C/C++.  Note, Fortran, IDL and Matlab use column major format. 


- Store multiple data items in one file, each having a unique tag name
  + tagnames follow the convention of unix style pathname e.g. /x or /mydata/x
  + this allows hierarchical storage of data

- Constant time lookup of items due to use of an inbuilt hash table
  + one can store as many data items as one wants without any slow down

- Automatic type and endian conversion  
- Support for mutiple programming languages
  + data can easily read in C, C++, Fortran, Java, IDL and Matlab
  + facilitates easy distribution of data 

- Structures supported in Python and IDL
  + Nested structures are also supported


File Format
-----------

An ebf file is a collection of data objects.  Each data object has two
parts

* Header-     this contains the metadata
* Binary data-   the actual data

.. figure:: ebf_format.svg


Hash table
----------------
EBF has an in built hash table for locating items. To access 
these tables the ebf library makes use of two Data objects 

 * "/.ebf/info" 
 * "/.ebf/htable". 

These are automatically created by the ebf library. 

Type Codes
-----------
Each data type has an integer code associated with it. The table 
below lists the supported data types and their integer codes.

========= =========       =======================
Data type code Data       Type Datasize (bytes)
========= =========       =======================
0	  undefined	  0
1	  char            1
2	  int32           4
3	  int64	          8
4	  float32         4 
5	  float64         8
6         int16           2
7         float16         2
8         structure       Implementation defined 
9         int8            1 
10        uint8           1 
11        uint16          2 
12        uint32          4
13        uint64          8
========= =========       =======================


Tag name Conventions
---------------------

* Tag names ("/simulation1/px") are analogous to unix filenames and
  begin with "/"  implying the root directory. This allows for
  hierarchical arrangement of data.

* Tag names are case insensitive.

* Tag name should be alphanumeric (A to Z, 0-9) and can include 
  underscore  character.  First character should be alphabetic to allow 
  compatibility across different programming languanges (e.g. IDL).

* Tag names containing '.'  (dot) character are reserved for speacial
  use and should be avoided. They are ignored when performing
  operations such as copying, diff and so on. 

* '/log'  tagname is used for specifying the information as to
  how the data was  created which can be written as a string of
  characters. 


* Although the format specification allows for an array of structures,
  but presently support for this is only available in python and idl. 
  Such cases generally arise when writing tables where potentially
  each column can be of a different data type.  A more user friendly
  way of writing a table is to write each column as a different data
  object. This on one hand allows selective retrieval of columns and
  on the other allows for new columns to be added to the data file.
 

EBF toolkit *ebftk* (formerly *ebftkpy*)
---------------------------------
*ebftk* is a python script, which contains a set of utility functions
to manipulate the EBF formatted files.  In EBF files each data has a
unique tag-name which begins with "/"  e.g.,  "/FeH" , "/Pos3"
etc. Data can be queried using tag-names.  ebftk  can be used to get
information about the data (TagNames) in the file, to print the data
in ascii or csv format using TagNames, renaming items, removing items 
and so on. 


Installation
^^^^^^^^^^^^^
*ebftk* and *ebfconvert* script is installed automatically when the python ebf 
module is installed. Using setup.py the python ebf module will be installed in a standard location 
but  where the *ebftkpy* script is installed can be controlled
by specifying *--install_scripts=mypath*. The mypath can be */usr/local/bin/*  
or somewhere in your home directory where you keep your programs e.g.,
*/home/user/sw/bin/*.  With *--user* option generally the scripts are 
installed in *~/.local/bin/*.
If mypath is not in your search path then you need to add it.  
Note, if python numpy module is not installed (although it is quite standard to have it)
you will also have to install it.

::

 $pip install ebfpy           OR
 $pip install ebfpy --user    OR

Alternatively
::

 $tar -zxvf ebfpy_x.x.x.tar.gz
 $cd ebfpy_x.x.x
 $python setup.py install --user                            OR 
 $python setup.py install --user --install-scripts=mypath   OR
 $python setup.py install  --install-scripts=mypath 


If *mypath* is not already in your search path then you can add it as follows.

::

 For bash shell:  in ~/.bashrc file add 
 export PATH=$PATH:/home/username/mypath 

 For tcsh or csh: in ~/.chsrc file add
 set PATH = ($PATH /home/username/mypath) 


Usage
^^^^^^

::

 NAME:
	 >>EBF<<  (Efficient and Easy to use Binary File Format)
	 ebftkpy 0.0.1 - a toolkit for  EBF  files
	 Copyright (c) 2012 Sanjib Sharma 
 USAGE:
	 ebftk	 -list filename
	 ebftk	  filename  (same as -list)
	 ebftk	 -cat filename "TagName1 TagName2 .."
	 ebftk	 -csv filename "TagName1 TagName2 .."
	 ebftk	 -ssv filename "TagName1 TagName2 .."
	 ebftk	 -stat filename "TagName1 TagName2 .."
	 ebftk	 -swap filename
	 ebftk	 -copy src_file dest_file
	 ebftk	 -copy src_file dest_file TagName
	 ebftk	 -diff  filename1 filename2
	 ebftk	 -rename  filename1 tagname_old tagname_new
	 ebftk	 -remove  filename1 tagname
	 ebftk	 -htab filename
 DESCRIPTION:
	 -list     view headers/TagNames of data in file 
	 -cat      print data in ascii format
	           e.g., for "TagName1" a record of rank 2 with
	           dimensions N and 3 will print a Nx3 table,
	           for "TagName2" a record of rank 1 with dimension N
	           will print a column of size N
	           multiple tags can be specified as space separated 
	           strings as "TagName1 TagName2"  
	           but the condition is that the number of elements in
	           each record should be same. This will print a Nx4 table
	 -csv      print data in csv tabular format, syntax same as cat
	 -ssv      print data in csv tabular format, but delimitier as space
	 -stat     print min max mean stddev of specified data tags
	 -swap     swap the endianness of a file, output file has
	           suffix _swap.ebf
	 -copy     copy contents of one file to another or only a tag
	 -diff     difference of two data items in two ebf files
	 -rename   rename a data item
	 -remove   remove a data item. It is renamed with prefix /.tr/ 
	           which can be restored using rename if needed
	 -htab     get information about internal hashtable
 CONTACT:
 http://ebfformat.sourceforge.net or bugsanjib at gmail 
 

 An exmaple print out of command ebftkpy file.ebf is given below
 ------------------------------------------------------------------
 name                           dtype    endian  unit       dim       
 ------------------------------------------------------------------
 /log                           char     little             [1552]    
 /typelist                      int32    little             [1]       
 /typelist1                     int32    little             [114]     
 /pos3                          float32  little             [163   3] 
 /vel3                          float32  little             [163   3] 

Converting ASCII files to EBF *ebfconvert*
------------------------------------------
*ebfconvert* is python script to convert ascii files to ebf files. Limited
support for converting fits files is also available. It can also handle 
large files that are difficult to fit in memory.


Usage
^^^^^^

::

 NAME:
	 >>EBF<<  (Efficient and Easy to use Binary File Format)
	 ebfconvert 0.0.2 - converts ascii and fits files to  EBF format
	 Can also handle very large files.
	 Copyright (c) 2013 Sanjib Sharma 
 USAGE:
	 ebfconvert [OPTIONS] filename
	 ebfconvert [OPTIONS] "prefix*suffix"
	 wild card must be in double quotes
 DESCRIPTION:
	 Outfile name is constructed from filename with suffix replaced by .ebf 

	 Files with suffix .fits .fit or .fts are assumed to be fits rest are
	  treated as ascii 

	 Empty lines and lines beginning with # are ignored.

	 Following delimiters are allowed (tab,space, comma, semicolon).
	 First non commented line determines the delimiter. 

	 Tab and spaces can be mixed but not others.

	 The program tries to guess datatype  from data and chooses from
	 float64, int64 and string

	 Default column names are of form 'col'+str(i).

	 If items in first non-commented lines are all strings then they treated
	 as field names.

	 For explicit control over formatting before the start of data
	 following lines can be added 

	 #fields   =[name_1 , name_2, name_3, ...name_n]
	 #datatypes=[float32, int64, S      , ...float64]
	 #units    =[m/s    ,      , kg     , ...m*kg^{-2}]

	 for fixed format data one can specify widths of fields as
	 #widths   =[5      , 4    , 10     , ...12]

	 instead of datatypes and widths one can also provide
	 #format   ="%-8.2f%+3.2e %10i %#3u %04d" 
	 This is c printf format %[flags][width][.precision]type
	 Note spaces between two % changes width of fields
	 Allowed formats codes (e, E, f, F, g, G, i, u, d, s)

	 format and width if present should not have blank members.
	 fields, datatypes, units can have blank members.
	 Allowed datatypes are (int8, int16, int32, in64, uint8, uint16, 
	 uint32, uint64, float32, float64, S)

 OPTIONS:
	 --join=join_file
		 When using wildcard this option joins multiple files into one.
		 All files to be joined should have same column names.
	 --schema=schema_file
		 This is an alternate way to explicity specify formatting
		 An example schema file is given below.Each field is separated
		 by comma. A field contains space separated entities
		 name datatype width unit. The width is for fixed width
		 format only, for other formats set it to 0.

		 (ra       float32 0 degree,
		 dec      float32         ,
		 velocity float32 0       )
	 -struct
		 will write data as numpy structure
	 -names
		 For fits file this will name data items using header.name.
		 Default is to name as du+str(i)
	 -attributes
		 For fits file this will parse the header and write keyword
		 value pairs as dataname+_attributes/. Default is to write
		 dataname_fitsheader