Query Amazon Redshift from SQL Server

There is a connector for Amazon Redshift (a data warehouse product available on Amazon Webservices) in Power BI that works just fine including support for direct query. But what if you want to pull data from a Redshift database to SQL Server? I wanted this to add content from a third party data warehouse on AWS to an on premise institutional data warehouse. In this scenario there is an ODBC driver that you can use create a linked server to the external DB and pull the data straight out.

Install the Amazon Redshift ODBC Driver on the SQL Server where you’ll be creating the linked server. Bring up the ODBC Data Source Administration and add a new System DSN for Amazon Redshift.

Specify the data source name and complete the server, database and user details. For the Redshift DB I was connecting to this was sufficient but there’s additional documentation at AWS.

Then create a new linked server in SQL Server Management Studio. Specify the linked server name, select Microsoft OLE DB Provider for ODBC drivers and set both the product name and data source to the DSN name defined above.

Once created you can use OPENQUERY to execute pass through queries against the linked server. Wrapping these queries up in a view for convenience.

create view redshift.table_name
as


select	* 
from	openquery(	REDSHIFT, 

					'	
						select	*
						from	public.table_name	
					'
					)

At the backend Redshift is running Postgres so the queries need to be PostgreSQL. The main issue you might hit when querying the data is for long text fields. I found that columns longer then varchar(4000) would throw an error.

OLE DB provider "MSDASQL" for linked server "REDSHIFT" returned message "Requested conversion is not supported."

Casting these columns to varchar(max) resolved the issue but really hit performance to the point that the queries were unusable. There were some hints at a better solution but I couldn’t make them work in this case so in then end settled for casting for varchar(4000). Not ideal but in this specific case I know no data will be truncated for the affected columns.