Serde: Prologue

Published on April 4, 2024

After diving into serde, there are some distinctions that implementations of Deserializer make which are interesting and relevant to my other implementations too.

intermediate data model

Serde has an intermediate data model, which you write functions to map your file format into as part of the Deserializer, and to map your data structure into as a part of the Serializer. I realised that the implicit intermediary that I had constructed with the first few implementations was the Vec<Option<WsvValue>> to represent a row. This is then the thing which a developer would manually manipulate into a more explicitly labelled data structure of their choice. Consequently, any benchmarking needs to include the transformation from this intermediary state to a given data structure.

parse, deserialize

The second distinction they make is between parse and deserialize functions. Parsing, to the serde crate, means identifying a part of the input. Deserializing is the part which can optionally transform the input data into the relevant internal form, such as replacing escape characters. It's a specific sort of conversion. This is most relevant in our case for strings:

Character Stream
---parse--->
WSV String identified
---deserialize--->
WsvValue(String)

Parsing holds the logic for identifying a contiguous collection of characters which identify a string represented using the WSV spec. Deserializing then takes those characters, trims the double quotes and converts the escaped characters.