Why binary data?

Sometimes it's required to send binary data as part of integration flow. Typical examples are:

  • Transferring product images together with the product description.
  • Transferring custom-encoded files before they could be parsed and transformed to JSON messages, for example, SFTP component may read CSV file without parsing it, while CSV component later will parse it and transform to JSON messages.
  • Preserving native data format while working with it in components, for example, sometimes it's required to work natively on XML without XML-JSON-XML transformation.
  • Handle large amounts of data as a single batch/bulk - as you know memory is an expensive resource and your components should be aware of it, but sometimes it's required to batch a large number of messages - attachment storage could be a good accumulator where you can safely and efficiently stream to and stream from.

How attachments work

As you know elastic.io uses an asynchronous message-oriented middleware (MOM) between the integration flow steps, it ensures reliability (it's persistent) and scalability of the integration flows. MOM is multi-tenant (in the multi-tenant plans) hence fairness, quality of service and internal load balancing have to be ensured - and it's much simpler to implement if messages on the queues are roughly the same size so that broker don't spend too much time on transferring a single multi-gigabyte message while thousands of smaller messages are waiting for their turn. As a summary - it is impractical to stream large payloads though MOM broker, therefore, binary attachments have to be offloaded to the external storage and linked by reference within the elastic.io message.

We are using a reserved place in the message structure called (unsurprisingly) attachments to store attachment references. See sample below:

{
   "body": {
      "sku": "1234",
      "name": "Mac Book Pro",
      "vendor": "Apple"
   }
   "attachments": {
      "frontview.jpeg": {
         "content-type": "image/jpeg",
         "size": "45889",
         "url": "http://steward.marathon.mesos:8091/files/1cfc3a71-d7a7-44e6-a15e-ae18860d537c"
      }
   }
}


As you can see attachments are stored as a map where keys are attachment names and values are defined as:

content-type
[Optional] MIME content type of the attachment, will help you to get an initial idea of what could be inside.
size
[Optional] Size of the attachment in bytes
URL
HTTP URL where your component may download the attachment, you can be sure that URL contains all information you need to download it, no additional authentication should be required.
NOTE: You shouldn't assume or expect anything about the format or host/port part of that URL, as it could be changed without the prior notice.
NOTE: attachment URLs are internal elastic.io cluster specific URLs and currently can not be accessed from outside of the cluster

How to create a new attachment

In the previous section, you saw your component may access attachments so that you could work with binary data in your component, however, what if you would like to receive/pull binary data and let is available as an attachment to the next component? It's also easy to do - elastic.io API will help you with that.

To create a new attachment you would need to do an HTTP POST to the following URL:

curl https://api.elastic.io/v2/resources/storage/signed-url \
   -X POST \
   -u {EMAIL}:{APIKEY} 

Please make sure you are using authentication credentials that you will find in your container's environment variables. It will return you something like this:

{
  "get_url": "http://steward.marathon.slave.mesos:8200/files/ea941d07-3ff5-4df1-b812-1bae2f0b9c36",
  "put_url": "http://steward.marathon.slave.mesos:8200/files/ea941d07-3ff5-4df1-b812-1bae2f0b9c36"
}

Now you can use put_url to store your binary data and get_url you should place into the attachment section (as shown above) so that next component may have a chance to read your binary data.