The final table, attractions, will hold the coordinates of the city attractions that my client can choose from. The second, zipcodes, will hold the zip or postal codes for the city. The first, accommodations, will hold the Airbnb data. My sample data will be sourced from delimited text files that I’ve uploaded as private objects to an Amazon S3 bucket and loaded into three tables. With the cluster ready to use I can load the sample data into my database, so I head to the Query editor and using the pop-up, connect to my default database for the cluster. Finally I click Create cluster to start the process, which will take just a few minutes. You can read more about available options for creating clusters in the Management Guide. The new cluster will be created in my default Amazon Virtual Private Cloud (Amazon VPC) for the region, and I also opted to use the defaults for node types and number of nodes. In Cluster details I fill out a name for my new cluster, set a password for the master user, and select an AWS Identity and Access Management (IAM) role that will give permission for Amazon Redshift to access one of my buckets in Amazon Simple Storage Service (Amazon S3) in read-only mode when I come to load my sample data later. This starts a wizard that walks me through the process of setting up a new cluster, starting with the type and number of nodes that I want to create. To do this I go to the Amazon Redshift console dashboard and select Create cluster. My first task is to load the various sample data sources into database tables in a Amazon Redshift cluster. In real life, I’m not a tour coordinator (outside of my family!) so for this post I’m going to focus solely on the back-end processes – the loading of the data, and the eventual query to satisfy our client’s request using the Amazon Redshift console. The provider for this data is Amt für Statistik Berlin-Brandenburg.Īny good tour coordinator would of course have a web site or application with an interactive map so as to be able to show clients the locations of the accommodation that matched their criteria. I then added to this zip code data for the city, licensed under Creative Commons Attribution 3.0 Germany (CC BY 3.0 DE). For accommodation I used Airbnb data, licensed under the Creative Commons 1.0 Universal “Public Domain Dedication” from. Firstly I obtained the addresses, and latitude/longitude coordinates, of a variety of attractions in the city using several ‘top X things to see’ travel websites. To show my scenario in action I needed to first source various geographic data related to Berlin. This spatial query is actually quite expensive in CPU terms yet Amazon Redshift is able to execute the query in less that one second. In a single query I can then join the data representing those polygons with data representing a set of accommodations to arrive at the results. Firstly, the set of points representing the attractions combine to form one or more polygons which I can use to restrict my search for accommodation. Geographic data is ideal for solving this scenario. My task is to locate accommodation for this client that is reasonably central to the set of attractions, and within a certain budget. To show the new type in action I imagined a scenario where I am working as a personal tour coordinator based in Berlin, Germany, and my client has supplied me with a list of attractions that they want to visit. The data in the files is expected to be in hexadecimal Extended Well-Known Binary (EWKB) format which is a standard for representing geographic data. In addition to creating GEOMETRY-typed data columns in tables the new support also enables ingestion of geographic data from delimited text files using the existing COPY command. The actual types supported for this data (and which will be used in table columns) are points, linestrings, polygons, multipoints, multilinestrings, multipolygons, and geometry collections. The type is abstract, meaning it cannot be directly instantiated, and polymorphic. The GEOMETRY type enables us to easily work with coordinates such as latitude and longitude in our table columns, which can then be converted or combined with other types of geographic data using spatial functions. Coordinates, elevation, addresses, city names, zip (or postal) codes, administrative and socioeconomic boundaries are all examples of geographic data. Geographic data (also known as georeferenced data) refers to data that has some association with a location relative to Earth. This new type enables ingestion, storage, and queries against two-dimensional geographic data, together with the ability to apply spatial functions to that data. Today, Amazon Redshift announced support for a new native data type called GEOMETRY.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |