Importing Data with FTP

When you import data, you upload it to an existing table.

Before You Start

Before you can follow the process below, these prerequisites must be met:

  • Your site must be configured to use DSW.

  • Your FTP credentials must be created.

Contact your Algonomy team if either of these is not the case.

The DSW FTP server can be accessed with this command:

Copy
ftp build-ftp.richrelevance.com 2222

Table Requirements

  • The table must be external.

  • The table must not be partitioned.

If you have already created a table that meets the requirements, you can simply upload the data. If you do not already have the table you want, you need to create it before uploading the data.

Once the table exists, you upload data using standard FTP commands. You can also use custom commands for sending the data from the FTP server to Hive.

Custom Commands

Once the data has been transferred to the FTP server, you can use custom commands to perform operations on the file.

There are currently two custom commands:

  • CAT - prints the first lines of a file

  • HIVEUPLOAD - puts the file into a predefined Hive table. It takes two arguments:

    • The file name.

    • The table name given as a -table argument.

The additional functions are invoked via the site command.

For example, assume you have a file foo.txt on your local file system. You can upload the file to the FTP server using standard PUT operation like so:

Copy
> put foo.txt 

To print the first lines of the file, do this:

Copy
> site cat foo.txt 

To upload the file to a Hive table foo_table, do this:

Copy
> site hiveupload foo.txt -table foo_table

This step-by-step example of importing a text file via FTP to Hive assumes that an FTP user account is set up for site 608.

In the Query Editor

  1. In the Query Editor, on the DataMesh Tools page, select the database you want to use, work608:

  1. Next, use the query editor on the Data tab to create a table:

Copy
CREATE EXTERNAL TABLE ftp_test ( id bigint, name string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY "," ;

Prefer to work in Hive?

You can also create the table in Hive, but then the command looks like this:

Copy
CREATE EXTERNAL TABLE ftp_test ( id bigint, name string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY "," 
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

On a local machine

Create a text file ftptestcsv.txt with the data. In the table above, we declared two columns separated by a comma, so a file like this would be valid:

Copy
1,blah 
2,blimp 
3,blech

Connect to the FTP server, upload the file, and upload it to Hive:

Copy
ftp build-ftp.richrelevance.com 2222 
// Log in as user/pwd  
put ~/ftptestcsv.txt foo.txt 
site hiveupload foo.txt -table ftp_test