Simon Hughes - Software developer

Tuesday 26 September 2023

AWS DynamoDB vs Azure CosmosDB vs Azure Table Storage pricing comparison

Using a db size of 10Gb, and 10% writes and 90% reads. Provisioned databases are for 1 year. For AWS SimpleDB the amount of CPU is hard to quantify, so I used the following formulae (0.248+((requestsPerSecond)/1.5*0.14))*usdgbp This matches closly to Azure Table Storage, which is to be expected. Here are the results: (click on image to view)

(click on image to view)

Zooming into the lower end (click on image to view)

Download Azure-CosmosDB-pricing.xlsx spreadsheet here.

Summary:

Both AWS and Azure are similar in NoSQL pricing. For the small companies out there needing the cheapest NoSQL options there is Azure Table Storage and AWS SimpleDB. These are the cheapest up until you hit 400 reads per seconds which is plenty for most companies. However, be aware of the per-item size constraints below.

Table Storage is about 13ms for a read and CosmosDB is about 8ms.

Many developers use Amazon SimpleDB in conjunction with Amazon Simple Storage Service (Amazon S3). Amazon SimpleDB can be used to store pointers to Amazon S3 object locations and detailed information about the objects (metadata), thereby supplementing Amazon S3 with the rich query functionality of a database.

After 400 reads per seconds, then its Azure CosmosDB and AWS DynamoDB until you hit 600 reads per second.

600+ reads per seconds then AWS DynamoDB Provisioned is the cheapest.

Per item size constraints:

AWS DynamoDB: 400Kb.
AWS SimpleDB 256 name/values, each having a max size of 1Kb.
Azure CosmosDB: 2Mb.
Azure Table Storage: 131Kb.

Warning: "CosmosDB Serverless" and "AWS Dynamo DB on demand" can get very expensive very quickly. They correlate very closely with each other on price. So if you find yourself going over 50 reads/writes per second, move onto another tariff.

Monday 11 November 2019

AI Reading comprehension scores

In September 2019 there was a great many blog posts about AI surpassing human reading comprehension scores.
Humans comprehension of a piece of text is 89.452%

As of June 4th 2021, AI is now hitting scores of 93.214, by IE-Net (ensemble) - RICOH_SRCB_DML

Plotting the scores on a graph:

Extrapolating the scores, we should see an AI hit 100% sometime in 2021.
Spreadsheet I created for the graph above: AI-Comprehension-score.xlsx

Original data source: https://rajpurkar.github.io/SQuAD-explorer/
Also see https://paperswithcode.com/sota/question-answering-on-squad20

Monday 23 September 2019

Entity Framework - Things you should be aware of…

1. Entity Cache

First thing to be aware of is entity caching

In this example, the first query reads data from the repository and materialize the data to a category entity, and update its Name. Then the repository is queried again by Name.
After reading the data, Entity Framework founds the primary key is the same as the cached entity, so Entity Framework does not materialize the data just read, it reuses the previous category entity. Performance is improved by skipping the materialization, but tricky results can happen.
When using DbSet.SqlQuery to directly execute SQL query in the repository, Entity Framework still looks up cache before materializing.

2. Materializing a new entity (no caching)

Entity is not cached when tracking is turned off, or entity is not queried from the repository. Each of the following queries materializes a new entity:

3. DbSet.Find

DbSet.Find accepts the primary keys and returns an entity. Calling Find can improve the performance, because it looks up cache before querying the database.

Calling Find can improve the performance
It looks up cache before querying the database

4a. LINQ query translation cache

Entity Framework translates a LINQ to Entities query in 2 steps:

Converts .NET expression tree to database command tree
Generate SQL from database command tree

To improve the performance, the generated SQL is automatically cached for each database command tree. Take the following query as example:

These first LINQ query builds expression trees with a ConstantExpression node representing int value 1.
The second query builds similar expression tree but with a different ConstantExpression node representing int value 10.
They are converted to 2 different database command trees, with 2 different DbConstantExpression nodes.

4b. Cached translations

To resolve this problem, these queries can be parameterized by simply replace the constants with variables:

The predicate lambda expressions capture variable minLength with the closure syntactic sugar.
The variable access is compiled to field access. So in the LINQ queries’ expression trees, there are no longer ConstantExpression nodes, but FieldExpression nodes. Entity Framework converts these FieldExpression nodes to DbParameterReference nodes, representing int parameters. As a result, these 2 LINQ queries are converted to identical database command trees.

5a. Skip & Take

If a query method accepts values instead of lambda expression, this parameterization approach does not work. For example, Skip and Take accept int values as parameters:

The above LINQ queries access to variable skip and take, but these variable access are also represented by ConstantExpression nodes. So their expression trees are different, and converted database command trees are different, and their translations cannot be reused for each other.

5b. Lambda expression

To resolve this problem, Entity Framework provides a lambda expression version for these methods:

5c. Skip & Take (optimal solution)

Now Skip and Take can access variables via closure:

These LINQ queries have FieldExpression nodes again. Entity Framework can convert them to identical parameterized database command trees. Now their translations can be reused for each other.

So what?

It doesn’t actually hurt anything, does it?
Unfortunately, it hurts SQL Server.

SQL query plan cache

LINQ queries with different constants are translated to different SQL queries.
The Previous queryWithConstant1 and queryWithConstant2 are translated to:

Multiple query plans :-(

They have different query plans in SQL database, which cannot be reused for each other:

Single query plan

With parameterization, queryWithClosure1 and queryWithClosure2 are translated to identical SQL queries, with different parameter values:

Cached SQL query plans :-)

queryWithClosure1’s query plan is now cached and reused for queryWithClosure2:

Final take-way...

If you forget everything, and can only remember two take-aways from this blog:

Remember to use lambda’s for your Skip & Take functions.

Use .Find() as this can skip hitting the database altogether.
Instead of

.Where(x => x.Id == SomeId);

DbSet<T>.Find();