Importing Data with FTP
When you import data, you upload it to an existing table.
Before You Start
Before you can follow the process below, these prerequisites must be met:
-
Your site must be configured to use DSW.
-
Your FTP credentials must be created.
Contact your Algonomy team if either of these is not the case.
The DSW FTP server can be accessed with this command:
ftp build-ftp.richrelevance.com 2222
Table Requirements
-
The table must be external.
-
The table must not be partitioned.
If you have already created a table that meets the requirements, you can simply upload the data. If you do not already have the table you want, you need to create it before uploading the data.
Once the table exists, you upload data using standard FTP commands. You can also use custom commands for sending the data from the FTP server to Hive.
Custom Commands
Once the data has been transferred to the FTP server, you can use custom commands to perform operations on the file.
There are currently two custom commands:
-
CAT - prints the first lines of a file
-
HIVEUPLOAD - puts the file into a predefined Hive table. It takes two arguments:
-
The file name.
-
The table name given as a -table argument.
-
The additional functions are invoked via the site command.
For example, assume you have a file foo.txt on your local file system. You can upload the file to the FTP server using standard PUT operation like so:
> put foo.txt
To print the first lines of the file, do this:
> site cat foo.txt
To upload the file to a Hive table foo_table, do this:
> site hiveupload foo.txt -table foo_table
This step-by-step example of importing a text file via FTP to Hive assumes that an FTP user account is set up for site 608.
In the Query Editor
-
In the Query Editor, on the DataMesh Tools page, select the database you want to use, work608:
-
Next, use the query editor on the Data tab to create a table:
CREATE EXTERNAL TABLE ftp_test ( id bigint, name string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "," ;
Prefer to work in Hive?
You can also create the table in Hive, but then the command looks like this:
CREATE EXTERNAL TABLE ftp_test ( id bigint, name string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
On a local machine
Create a text file ftptestcsv.txt with the data. In the table above, we declared two columns separated by a comma, so a file like this would be valid:
1,blah
2,blimp
3,blech
Connect to the FTP server, upload the file, and upload it to Hive:
ftp build-ftp.richrelevance.com 2222
// Log in as user/pwd
put ~/ftptestcsv.txt foo.txt
site hiveupload foo.txt -table ftp_test