Data Integration - Data Import

circle-info

This feature is only available on the Enterprise plan.

Data Import

The Data Import feature allows you to send event data stored in your AWS S3 or GCP GCS to Hackle. The Data Import feature imports data on a daily (Daily) basis.

Supported Cloud Storage

AWS

S3

Y

AWS

Redshift

Coming soon

GCP

GCS

Y

GCP

BigQuery

Coming soon

Requirements

The following tasks are required before importing data.

Create Key and Grant Permissions: GCP GCS

For GCP GCS, you can create a Key by referring to the GCP IAM > Creating and managing service account keysarrow-up-right documentation.

The following permissions are required when creating a Key for GCS access.

storage.buckets.get
storage.objects.get
storage.objects.create
storage.objects.delete
storage.objects.list

Create Key and Grant Permissions: AWS S3

For AWS S3, you can create a Key and grant the necessary permissions by referring to the following documents.

  1. Follow the AWS Docs: Creating an IAM Userarrow-up-right documentation to create an AWS IAM User.

  2. Follow AWS Docs: Creating IAM Policiesarrow-up-right to create a Policy that includes the IAM Policy attached below. Then add the IAM Policy to the IAM Role created in the previous step.

Data Import Format

Data Import currently supports Apache Parquetarrow-up-right format. Below is the schema for the Parquet format data. Process and store data in the format described in the table below.

Column Category
Column Name
Column Type
Column Value (Example)
Description

Insert ID

insert_id

STRING

8fb8e088-9245-4fce-bb87-7e09d9917ed6

UUID value used to check for duplicate events.

Event Key

event_key

STRING

purchase

Event name.

Client Timestamp

ts

TIMESTAMP

2023-01-01 00:01:02.333 (UTC)

Timestamp in UTC (sub-millisecond values are truncated)

Metric Value

metric_value

DECIMAL(24, 6)

0.0

Used for value calculations in analytics and experiments. (Store 0.0 if not needed)

Identifiers

identifiers

Map<String, String>

{ "id": "8fb8e088-9245-4fce-bb87-7e09d9917ed6", "device_id": "89ABCDEF-01234567-89ABCDEF", "user_id": "49591", "session_id": "1659710029.4.1.1659710504.0" }

Map containing user identifiers

  • (Optional) user_id: Logged-in user identifier (corresponds to userId when sent via Hackle SDK)

  • (Required) id: Device identifier (corresponds to id when sent via Hackle SDK)

  • (Required) device_id: Device identifier (corresponds to deviceId when sent via Hackle SDK)

  • (Optional, stored when using GA) ga_session_id, ga_device_id

Identifier key values are stored in lowercase.

Event Properties

event_properties

Map<String, String>

{ "product_id": "33537", "product_category": "LEISURE", "order_id": "291994100" }

Properties containing event information

Property key values are stored in lowercase.

User Properties

user_properties

Map<String, String>

{ "grade": "GOLD", "date_signed": "2022-07-01", "date_recent": "2023-01-17" }

Properties containing user information

Property key values are stored in lowercase.

Platform Properties

platform_properties

Map<String, String>

# Android example

/{ "osname":"Android", "appversion": "6.9.0", "language":"ko", "osversion":"12", "devicevendor":"samsung", "versionname":"6.77.0-DEBUG", "platform":"Mobile", "devicemodel":"SM-S908N" /}

iOS example

{ "osname":"iOS", "appversion": "6.9.3", "language":"ko-KR", "osversion":"16.0.2", "devicevendor":"Apple", "versionname":"6.77.0", "platform":"Mobile", "devicemodel":"iPhone14,2" /}

Properties containing platform information

  • (Required) osname (Android, iOS)

  • (Required) appversion

Property key values are stored in lowercase.

Below is a summary of the data format described in the table above.

Processing Data for Data Import

Process the data in the Parquet Format described above and store it by day within the bucket.

Below are examples of stored partitions and files.

Data Import Request

Please contact Hackle to request Data Import. The following information is required for Data Import.

Last updated