Home   Index   list of chapters    previous chapter   next chapter  
 

Ch 8 - Exporting and Importing data - CSV files



8.1 What are CSV files?

To 2.6.4.2 (Editing Scrollable lists)
To 8.4.2 (Directing imported data to the correct fields)
To 8.4.3 (Importing data from plain text files)
To 9.1.2 (Merging the data from Powerbase)

CSV stands for "comma-separated values" and is the name given to files consisting of lines of data, each of which contains individual data items separated from each other by commas. Such files are widely used to mail-merge using a wordprocessor (see Ch 9) and also to transfer data from one application to another, e.g. from a database to a spreadsheet or from a RISC OS database to a PC database. We will refer to each line in a CSV file as a record and each item of data in such a line as a field since records and fields are the source and destination of such data when it is exported from or imported into a Powerbase database. The following should be noted:

  1. Fields which are non-numeric (e.g. plain text items such as names and addresses) are often enclosed in double quotes ("") whereas numeric data is not. This makes it possible for an application reading a CSV file to distinguish between numbers and strings (which might of course contain numerals) and also allows a comma to be used as a character within a string (such as often occurs in addresses) without being mistaken for a data separator.
  2. Null fields are usually included and can be located by looking for two commas with either nothing in between them or with only two double quotation marks between them, i.e. ,, or ,"",. Each record in such a file will always contain the same number of fields. Powerbase does, however, allow nulls to be omitted entirely when creating a file in which case the number of fields per record will vary.
  3. The way in which each record is terminated varies from one system to another. The last field in a record is followed not by a comma but by a line terminator. On RISC OS systems this is the same as in ordinary text files, the linefeed character (LF, ASCII value 10) and on PCs it is usually the carriage-return character (CR, ASCII value 13). You might, however, encounter CSV files in which both these characters are used, i.e. LF CR or CR LF. Powerbase allows any character or character-pair to be defined as the record terminator.
  4. Separators other than commas are sometimes used. The Tab character (ASCII value 9) is often used and such files are called TSV ("tab-separated values") files. Powerbase lets you define any character, or even a pair of characters, as the field separator. All such files created by Powerbase will be of type &dfe and display the CSV file icon although the default filenames offered do differ: "CSVfile" where the separator really is a comma, "TSVfile" where it's a TAB and "?SVfile" otherwise. In what follows we will, for convenience, refer to them all as "CSV files" whether or not the separator is a comma.
  5. Some applications which accept a CSV file as input expect the first line to contain the names of the fields which comprise the subsequent records, e.g. if each record consists of a name and a four-part address this header record might read: "NAME","STREET","TOWN","COUNTY","POSTCODE".
    A pair of radio buttons (only selectable when the Header button has been selected) enable you to choose between tags and descriptors in a header line, tags being the default. If you choose descriptors remember that a field doesn't necessarily have a descriptor. To avoid null strings in the header line Powerbase will substitute tags in such cases.
You can both export and import files with such a header but if importing a file from a non-Powerbase source you will need to substitute the correct field tags for the names. top of page

8.2 Setting the CSV options

To 8.4 (Using CSV files to import data)
To 14.2 (CSV options)

csvopt/png Choosing Options from the CSV files submenu displays the CSV options window (right) which lets you specify all the file characteristics described earlier. Pop-up menus give you a choice of field-separators and record-terminators with space to enter your own if you wish. The first three option buttons to the right of these menus cause an exported file to have, respectively, the following characteristics when the button is selected:

8.3 Exporting data as a CSV file

To 3.5 (Specifying which records to include)
To 8.4.3 (Importing data from plain text files)

Having set up your options as described above creating a CSV file is very like producing a report. First highlight the fields to be exported by clicking on each with ADJUST. Remember that the order in which the fields are highlighted is important. If a default field selection exists (i.e. a file in PrintRes called !Selection) it will be used to determine which fields are exported provided no alternative selection has been made.

csvexp/png

Next choose Export from the CSV files submenu (Ctrl X). The window which then appears is like a Save box but also features the Query panel. Type in a search formula to determine which records are exported. Finally, enter the name of the file and drag the file icon to a filer window. By default the file is saved in PrintJobs as usual, and you may simply click on the Export button or type Return. Experiment with saving CSV files with different settings of the CSV options and then loading the resulting files into Edit to examine them. If you select the Reverse button on the Query panel the CSV file will be created in reverse order.

The field-concatenation option (see 3.2.2) applies. This means that data which occupies separate fields in the Powerbase record need not do so in the exported CSV file. By holding down Shift when selecting the field with ADJUST, the comma (or other separator) which would normally follow is suppressed until you select a field without using Shift. A slightly problematic situation occurs when you concatenate a mixture of numeric and non-numeric fields with the In quotes option selected. In such a case Powerbase will enclose the whole concatenated group in quotes. e.g. NAME, Z, M and SYM from the Elements database would be exported, for actinium, as "ACTINIUM 89 227 Ac"

The Spacer (see 3.10.2) — in the above instance it is the default setting of one space — is used to separate the concatenated fields. To export only the displayed record hold down Shift when starting the CSV export.top of page

8.3.1 Scrollable lists and CSV files
To 8.4.3 (Importing into a scrollable list)
When you include a Scrollable list as one of the fields in a CSV file it will be written as a single CSV field if the format selected in the Report options window is As single line. The character used to separate data from different rows of the Scrollable list is set in Scroll term in the CSV options window and the default is a semicolon. The data from individual cells in the same row of the scrollable list are separated by the same CSV separator as is used between CSV fields.

The above description might seem confusing but will become clearer if you export data from the sample database Music as follows:

Only two records are exported. You will see that the composer's name is between quotes, as are the titles of the two works. For each record all the data from the Scrollable list is enclosed between quotes. In the second record you will see Andante tranquillo,08:05,1; Note the commas separating items from the same row of the Scrollable list and the semicolon indicating the row end. Next comes the data from the second row: Allegro,07:15,2; then the third: Adagio,08:32,3; and finally the fourth: Allegro molto,07:15,4 (no semicolon, this is the end of the line). Everything from Andante to the concluding 4 is enclosed between quotes.

A file exported in this way can be re-imported into a Powerbase database. You must ensure that the scrollable list data goes back into a scrollable list field, although it need not have the same number of columns as the one from which it was exported. If the new list has more columns than the old there will be one or more blank columns on the right hand side; if there are fewer columns data from the missing columns will be discarded. This closely resembles what happens when there are more or fewer fields in the database than in the one from which the file was exported (see 8.4.2).

Please note that the default CSV field separator and row terminator (comma and semicolon respectively) are by no means uncommon in textual data. If they occur as literal characters in the contents of individual cells Powerbase will be unable to distinguish these from genuine separators and row terminators and the data import will get out of step. The best solution, if you wish to export Scrollable list data from one database and import it into another, is to choose alternative characters for the separator and terminator. Tab has already been mentioned and there are several unlikely printable characters available on your keyboard. (Look at the key under the Esc key, for example.)

If you change the report format in Report options from As single line to As columns and export the CSV file again (in quotes as before) you will see the contents of each row enclosed between quotes and separated by a comma from the next row. The items from within each row are also separated by commas as before.

Exporting this second file without the quotes (or the first file if you first change Scroll term to a comma) will result in a file in which every single cell of the scrollable list appears as a separate CSV field, separated from the next by a comma. This might prove useful if transferring data from a database which contains a Scrollable list to one which doesn't.

Transferring data from and to scrollable lists in individual records was covered in 2.6.4.2.top of page

8.4 Using CSV files to import data

To Ch 2, Note 4

If you drop a CSV file on the record window of an open database the CSV options window (see 8.2) appears csv2/png with the title changed to "Import CSV file" and with some additional icons at the bottom of the window (see right), one of which displays the pathname of the file. Clicking on Import will make Powerbase try to create new database records from the file. If you decide not to do this you should click on Cancel. If you do wish to import the data there are some important consideration which will now be explained.

8.4.1 Ensuring that the correct options are selected
To 10.3 (Merging two databases)
If the CSV file originally came from a Powerbase database the settings in the CSV options window need to be exactly the same as they were when the file was exported. The exception to this is the In quotes button which is shaded on import because Powerbase doesn't need it. If the CSV file came from a PC or another RISC OS application you might have to load it into Edit to find out what separator and terminator are used. You will also probably need to set the filetype to &dfe as well.

There are three more option buttons at the bottom of the window. Display causes each record to be displayed as it is imported. The import process is slower with this turned on but more informative if you like to know how things are progressing. Strip spaces when selected strips any trailing spaces in the imported data-fields. Some database programs pad all fields to their maximum length by adding spaces to the end of the data where necessary. If you import such a file into Powerbase you will find that the caret will always be at the far right of the field even though visible characters do not fill the field, and some queries won't work properly. Selecting Strip spaces before importing the file overcomes the problem. The third option button determines how Sequence number fields are handled. If the button is ON imported sequence numbers are ignored and new ones assigned in accordance with the field's sequence number counter. With the button OFF sequence numbers from the CSV file are imported without alteration.top of page

8.4.2 Directing imported data to the correct fields
To 8.3.1 (Scrollable lists and CSV files)
To 8.5 (Using CSV files to modify existing records)
To 8.4.3 (Importing into a scrollable list)
If no fields on the record window are highlighted (i.e. with ADJUST) and the CSV file does not contain a header record (see 8.1 (5)) then the import process proceeds according to the following rules: There will be occasions when you don't want to fill the Powerbase fields sequentially as just described. There are two ways of making the process more specific:

(1) Highlight the required fields with ADJUST before starting the import. Data will then be read only into the highlighted fields, all other fields being ignored. The order in which the fields are filled is the order in which you highlighted them. The rules given above about underflow and overflow of data still apply.

(2) Give the file a header record containing the tags of required fields in the Powerbase record. There is nothing to prevent you from using Edit to add such a header to a CSV file which did not originate from a Powerbase application. The format of the header is illustrated in 8.1 (5). Importing then occurs just as if those fields were highlighted. Do not use both a header and highlighting.top of page

8.4.3 Importing into a scrollable list
Section 8.3.1 covered cases in which a CSV file includes scrollable list fields along with fields of other types. This section deals with the situation when a CSV file is specifically dropped onto a scrollable list with the object of importing data to the list alone.

Suppose you have exported such a file from a 3-column scrollable list as described in 2.6.4.2. If the file is subsequently dropped onto a 3-column list the data will fill the rows and columns exactly as they did in the list from which the file was exported, provided the cells are wide enough to hold the data items. If not the items are truncated to fit.

If you drop the file onto a 2-column list the third item in each line of the file will be ignored; the destination field will contain what the first two columns of the source field contained. If, on the other hand, you drop the file onto a 4-column scrollable list the 4th column of the list will be left blank. This is consistent with what happens when data is imported into more or fewer fields than the CSV file contains (see 8.4.2 concerning data underflow and overflow).top of page

8.4.4 Importing data from plain text files
(For data-import from a plain text file into an individual scrollable list field see 2.6.4.1).

It was explained in 8.1 (4) that files created as described above (8.3) can have separators other than a comma. If the file has been created using Powerbase's CSV exporting facility it will have been given the file-type &dfe as if it were a true CSV file. If it comes from some other source it is likely to be of type &fff (Text). It can still be imported but caution is needed because there are other circumstances in which a text file might be dropped on the record window. A properly-written script file (see Ch 12) would be recognised as such and therefore cause no problem, but any text file dropped onto the appropriate type of External field (i.e. a Text, Text Block or Referenced field) would become linked to that field instead of being treated like a CSV file. If you are importing data from a plain text file and your record contains fields of the aforementioned types be sure to drop the file on the window background, not on the External field The appropriate window will then appear with the title "Import text file".

You are strongly advised to use proper CSV files if at all possible. You could, in fact, use the filer to set the filetype of such non-standard files to &dfe (omit the "&") so that the problem of Powerbase taking the wrong action doesn't arise. Be sure to use Options to set the correct field separator and record terminator though.top of page

8.4.5 What if the imported data won't fit?
There are two situations in which this can happen. The database might not contain enough free records to hold all the imported data and so you get a "Database full when reading CSV file" error. To avoid this either make sure the database is big enough before you start or place a suitable value in the Increment for expansion icon in the Change length window. The latter is accessible from the Utilities submenu of the iconbar menu (see 10.4).

The second situation is where an item is too long for the destined database field. When importing data Powerbase maintains a file called TooBig inside the database's PrintJobs directory. Anything which won't fit in the target field is instead written to this file together with information about where it was intended to go and an "@" character is placed in the database field to draw your attention to it. No Powerbase field other than an External type may be longer than 246 characters and if an item of imported data exceeds this a note will be made in the TooBig file advising you to define an External field (Text Block or Text) for such data. After completing a CSV import operation it is advisable to look at the TooBig file to see if any remedial action is needed.top of page

8.5 Using CSV files to modify existing records

In addition to using CSV import for creating new records Powerbase will let you use it to modify existing records. This capability should be used with caution since careless use can irrevocably garble a database. There are three relevant radio buttons in the CSV options window. They are called Modify existing, With primary key and With record number and their actions are as follows:

When any of the above three buttons is selected, clicking Import will warn you what is about to take place and ask for confirmation. If you cancel the operation the radio button will be deselected and the normal default (Create new records) selected instead.top of page

8.6 Importing data into validation tables

To 5.6 (The validation table menu)

It was explained in section 5.6 that the contents of a validation table can be exported as a CSV file. You may also import data by dropping a CSV file onto the table. The Options window appears as described in 8.4 so that the appropriate separator and terminator can be selected if necessary.

It is not possible to increase the number of rows or columns in a validation table by this method. Only those items for which a space exists will be imported; the rest are ignored. Over-long data items are truncated. If a CSV file is dropped onto a table with more columns than there are fields in the CSV record, one or more columns in the table will be left blank.

This facility makes it possible to convert an old-style (i.e. non-modifiable) validation table to a new-style (modifiable) one without having to re-type all the data. First export the data as a CSV file, then create a new table (which may have more rows and/or columns than the old one if you wish) and drop the CSV file onto it. If you give the new table the same name as the old one the former will overwrite the latter when the database is closed.top of page

8.7 Creating a new, working database from a CSV file

There is an option button in the CSV options window called With field data which only becomes active when the With header button is selected. If With field data is selected the header record of a saved CSV file contains not only the field names but also the field lengths and types. Again it is suggested that you create a file with both With header and With field data selected and look at it in Edit.

Each part of the header contains four items of information separated by | (vertical bar) characters. These are, in order:

  1. the field type number (These may be examined in the file !Powerbase.Resources.ValStrings.)
  2. the maximum length of the field in characters
  3. the field descriptor
  4. the field tag
The CSV separator, as used in the remainder of the file, is used to separate each of these groups from the next. Assuming the separator to be a comma the structure of the header is:

<type>|<length>|<desc>|<tag>,<type>|<length>|<desc>|<tag>, ...

It is quite possible to take a CSV file from a non-Powerbase source and add such a header to it by hand. The field-type can be omitted, in which case the type is set to 0 (Unrestricted) and remaining number is assumed to be the length. It is also possible to omit either the descriptor or tag, but not both. If only one string is supplied it will be used for both descriptor and tag. (Remember that tags cannot be longer than four characters so anything longer will be truncated; take care that this does not lead to tag-duplication or an error will occur.) The minimum specification for each field is therefore <length>|<tag>, where the string <tag> is also used for the descriptor. You can force a null descriptor by putting two bar characters together as in <length>||<tag>. (A null tag is, of course, not allowed.)

A file created in this form has enough information in its header to be converted into a fully-functioning database. It is not meant to be dropped onto the window of an open database. It should be dropped onto the Powerbase icon on the iconbar when no database is open. The Import CSV window will appear. Click Import and you will be asked to enter a name for the database which will always be created in the same directory as the CSV file. All the fields will be ranged on the left of the record window, one beneath another, and the primary key will consist of the first four characters of the first field, but these characteristics can be changed later using the Alter format utility (see 10.2).

The feature described is only really intended for creating simple databases in which all fields are of Editable or Check-box types. The contents of Computed fields will be transferred and will have the correct data type but the associated formulae relating them to other fields will have to be entered manually. Don't try to create databases containing Scrollable lists, Button or External fields using this method.top of page

8.8 Other methods of exporting data

To 2.5.2 (Copying fields)

It was noted in section 2.5.2 that the contents of a record field can be dragged by means of SELECT and dropped into another field, a writable icon in a dialogue box, an editor or wordprocessor, or a filer window. In the last case the data is saved as a text file named from the tag of the data field.

This drag-and-drop method can be extended considerably. If a number of fields are selected, as in a print selection, then dragging and dropping as above will transfer the whole selection to an editor, wordprocessor or to the filer (where it is saved as a file called Selected). You can't transfer multiple fields between databases in this way though, and dragging such a selection to a writable icon transfers the first field in the selection only. When dragging a selection like this the pointer can either be over a field (not necessarily a selected one) or over the window background; it doesn't matter.

An even more powerful extension is the ability to export all the exportable fields in a record by holding down Ctrl and dragging from anywhere in the record window. If you have two identical databases open this method can copy a record from one to the other as well as to editors, wordprocessors and the filer.

The main menu has an entry Export selected which displays an ordinary Save box. This is an old feature of Powerbase, now superseded by the drag-and-drop facilities described above apart from the fact that the older method lets you choose your own filename for saved data.top of page

 

Home   Index   list of chapters    previous chapter   next chapter