I ran into an issue using the groupBy()
method for collections where I wanted to "roll up" line items to list a single SKU and the quantity instead of one item per entry.
This worked fine until "SKU sharing" was introduced where the same SKU could have different price points.
In this case, it was a free opening night but the concept of "early bird pricing" was also introduced, hoping to share the same SKU.
I'm unsure if this is against typical eCommerce rules but this is to synchronize orders into a centralized system, not a widespread concern.
The code looked something like:
return $registrations->groupBy('ticketLedgerCode')->map(function (Collection $registrations, string $key) {
$quantity = $registrations->count();
$registration = $registrations->first();
return [
'line_type' => 1,
'item_number' => $registration?->ticketLedgerCode,
'description' => $registration?->ticketName,
'unit_price' => $this->getAmountFromStripe($registration?->price),
'discount_code' => $registration?->promoCode,
'discount_price' => $registration?->price,
'discount_charge' => $registration?->charge,
'quantity_ordered' => $quantity,
'quantity_UOM' => "EA", // default
'start_date' => date('Y-m-d'),
'duration' => 12,
];
})->values()->toArray();
You may notice part of the problem immediately. The groupBy
method converts an existing collection into a subcollection where $registration is the group of distinct ticketLedgerCode
.
Calling first()
here is where the problem originates as this collection can have different prices that I didn't account for.
After mapping over the result we use values()
because the collection key is the ticketLedgerCode
which we don't really need, we only care about the underlying result.
The groupBy
method takes an array but it was harder to reason about for me and more difficult to traverse to the point I cared about.
In the example above, $registrations
would have the price as a key and a collection of the items that matched that price.
The result of the nesting looks something like:
{
"123456789001": {
"25900": [
(registration)
]
}
}
The approach that was easier to reason about was to nest another groupBy
clause for the price
field:
return $registrations->groupBy('ticketLedgerCode')->map(function (Collection $itemNumbers, string $key) {
return $itemNumbers->groupBy('price')->map(function (Collection $prices, string $key) {
$quantity = $prices->count();
$registration = $prices->first();
return [
'line_type' => 1,
'item_number' => $registration?->ticketLedgerCode,
'description' => $registration?->ticketName,
'unit_price' => $this->getAmountFromStripe($registration?->price),
'discount_code' => $registration?->promoCode,
'discount_price' => $registration?->price,
'discount_charge' => $registration?->charge,
'quantity_ordered' => $quantity,
'quantity_UOM' => "EA",
'start_date' => date('Y-m-d'),
'duration' => 18,
];
})->values();
})->values()->flatten(1)->toArray();
I had originally used first()
instead of flatten(1)
as I incorrectly assumed the outer array would always be the same but in the case of different SKUs the output is different than the JSON above:
{
"123456789000": {
"19900": [
(registrations)
],
"25900": [
(registrations)
],
},
"123456789001": {
"19900": [
(registrations)
],
"25900": [
(registrations)
]
}
}
The JSON notation is a little misleading using objects vs arrays but in Laravel and PHP in general this is represented as nested associative arrays.
Turns out there is an approach using groupBy(['ticketLedgerCode', 'price'])
that I just worked out by creating this journal entry.
Instead of $itemNumbers->groupBy('price')->map
the internal section should just be $itemNumbers->map
.
If I were to nest a 3rd or 4th time, I would need to map down to those levels and flatten(1)
would roll everything back together.
The real superstar here is flatten(1)
as I incorrectly assumed it was the same as flatten()
, but it isn't. The method defaults to infinity and will turn an array of records into an array of every single property, which is not what I wanted.
I'm not going to change the code to use an array as I don't mind the explicit groupBy
call.
I had also tried using both Cody and ChatGPT to explain the issue and while Cody would generally do better during its beta, it wouldn't produce what I was looking for.
ChatGPT 3.5, on the other hand, correctly intuited that flatten(1)
was what I needed.
AI code tools are great for explanations, and correct completions in Cody are like magic, but the hallucinations range from subtle to extremely problematic.
Wed Jul 19
I ran into an issue where I needed to process a number of Stripe webhooks stored in a webhook_calls
table.
To mark progress, I thought it would be useful to join the invoices
table but I ran into a snag.
In the webhook payload
, the JSON has double quotes so an invoice shows as something like "in_"
and in the invoices
table it isn't quoted as the varchar covers it.
I looked into the QUOTE()
function first but it turns out TRIM('"' FROM)
is what I was after. This trims the leading and ending quote characters to match the invoices
datatype.
An example of the SQL statement used:
SELECT
webhook_calls.id,
payload_type,
JSON_EXTRACT(webhook_calls.payload, '$.data.object.id') AS `type_id`,
JSON_EXTRACT(webhook_calls.payload, '$.data.object.object') AS `type`,
JSON_EXTRACT(webhook_calls.payload, '$.data.object.lines.data[0].metadata.user_id') AS `user_id`,
JSON_EXTRACT(webhook_calls.payload, '$.data.object.lines.data[0].plan.nickname') AS `nickname`,
JSON_EXTRACT(webhook_calls.payload, '$.data.object.lines.data[0].plan.metadata.base_sku') AS `base_sku`,
webhook_calls.created_at
FROM
webhook_calls
LEFT JOIN
invoices ON invoices.invoice_id = TRIM('"' FROM JSON_EXTRACT(webhook_calls.payload, '$.data.object.id'))
WHERE
MONTH(webhook_calls.created_at) = 7
AND YEAR(webhook_calls.created_at) = 2023
AND JSON_EXTRACT(webhook_calls.payload, '$.data.object.lines.data[0].plan.metadata.base_sku') IS NOT NULL
AND invoices.invoice_id IS NULL
ORDER BY
JSON_EXTRACT(webhook_calls.payload, '$.data.object.lines.data[0].metadata.user_id')
DESC,
payload_type ASC
This gets the webhook_calls
records that have been created July 2023 with a base_sku
not null as this processor ignores any products that aren't explicitly configured. This paired down thousands of events into a manageable number.
We process only invoice events so this was enough to create a list where I could manually match up the invoice.created
to its pair invoice.payment_succeeded
event.
I thought about automating this but it wasn't that many records to deal with and I wasn't sure what would happen if this was outside of artisan tinker
.
I needed the jobs to be synchronous to make sure the invoices
record was created first, followed by the subscriptions
that gets created from the second event.
If I were to automate this I would sort by created_at
, find the first event, dispatch, then find the second and sleep for a second before dispatching again. As a sanity check I could check that the invoices
was created and subscriptions
after that.
I'm not the biggest fan of MariaDB as I really wish it had MySQL's ->
and ->>
shorthand for the JSON functions but this SQL is compatible with both systems.
Due to an issue with Digital Ocean Spaces in the NYC3 region, we migrated a 16TB bucket to their newest SFO2 region.
The primary bottleneck we faced was ListObjectsV2
was exceptionally slow after the bucket filled and the user load began hammering it.
We tried numerous troubleshooting steps on the same region and would see some improvement before periodically crashing.
It also was not consistent where things would be fine some days but not others, and I suspect load was a contributing factor.
Why mention the backstory? Because I think it's worth highlighting that sometimes the quickest solution in any cloud offering is to move regions. Hindsight was 20/20 on the whole ordeal.
Where does k6 fit into this? My task was to load-test the new data center for peace of mind, knowing we couldn't simulate the type of load our users brought. I could test our droplet that leveraged Spaces, but the rate limits made that more troublesome than necessary.
This issue describes the problem I had, as well as my naivete showing some rookie mistakes. Understanding how to alter the host was crucial as Spaces do not conform to the s3.region.amazonaws.com format of AWS but use region.digitaloceanspaces.com instead.
With that out of the way, I tinkered with reimplementing listBuckets to see if I could isolate the problem. In a stroke of dumb luck and getting Insomnia functional using the insomnia-plugin-aws-iam-v4
plugin for authentication, I stumbled on the fix.
It turns out that instead of signedRequest.body || ''
the snippet should be:
const res = http.request(method, signedRequest.url, signedRequest.body || null, {
headers: signedRequest.headers,
})
Spaces cannot seemingly process a blank body, but the body can be null.
The problem was a fluke we've never seen in 6 years of leveraging Digital Ocean Spaces. It was intermittent to where we would never see it happen until recently when it affected everyone. Region switching wasn't easy, as several processes are slower due to droplets being in NYC3. Most problems did not require this response for some of our most annoying corrections.
I had also given up numerous times on finding a solution with k6 because it's an implementation of JavaScript on a Go runtime, so we can't always throw in random JS packages and hope for the best. I also concluded we didn't need to load test services that we couldn't stress too much.
I'm not a fan of giving up on anything though I know I can't win every battle. I don't technically have a solution here either, as my attempts to parse the XML using the same methods aren't producing the results I expected. I believe this change would possibly be a net positive for S3-compatible stores like Minio, where I would want to stress-test an instance on Fly.io or other droplets.
I have a project that compliments Laravel Forge by storing a centralized copy of the .env
file for all sites and servers in the fleet of Digital Ocean droplets.
There is an environment_raw
text column for the .env
file with an embedded environment
jsonb
column that has an id
, key
, and value
tuple when we look at the field in something like TablePlus
.
WITH fields_rowset AS (
SELECT
jsonb_array_elements(environment) AS fields
FROM
sites
) SELECT DISTINCT
fields -> 'key' AS key_name
-- fields -> 'value' AS key_value
FROM
fields_rowset
WHERE
fields -> 'value' IS NOT NULL
ORDER BY
key_name ASC
The following queries utilize a few different methodologies for searching the embedded schema. We start with a singular key-value pair.
SELECT
*
FROM
sites
WHERE
environment @> '[{"key": "STRIPE_SECRET", "value": "sk_test_DUMMY"}]';
SELECT
*
FROM
sites
WHERE
environment @> ANY (ARRAY['[{"key": "STRIPE_SECRET", "value": "sk_live_DUMMY"}]']::jsonb[]);
SELECT
*
FROM
sites
WHERE
environment @> ANY (ARRAY['[{"key": "STRIPE_SECRET"}]']::jsonb[]);
SELECT
*
FROM
sites
WHERE
environment @> ANY (ARRAY['[{"value": "sk_live_DUMMY"}]']::jsonb[]);
The array
and any
keywords are more versatile approaches but require casting to jsonb
explicitly.
There is another syntax like environment -> 'key' @> '"APP_DEBUG"'
but I can't produce the expected result.
It's an empty result despite being configured in just about every Laravel environment.
SELECT
*
FROM
sites
WHERE
environment @> ANY (ARRAY['[{"key": "STRIPE_KEY", "value": "pk_test_DUMMY"}]', '[{"key": "STRIPE_SECRET", "value": "sk_test_DUMMY"}]']::jsonb[]);
While it is redundant for an embedded schema of key/value pairs, it makes the data model easier to query than guessing which variables are present on any given system. Some machines have WordPress installed with totally different settings than Laravel. There are also multiple Laravel versions where certain variables have changed names throughout the years.
There are shortcomings to the approach as this is separate data to Forge or the servers themselves, but it is a much faster interface than making API calls to every server to find this information. One shortcoming was not having a versioned schema for historical data for auditing reasons.
After creating a very rudimentary version of this, I realized tools like Doppler or Infisical offer a more robust centralized place for secrets management far beyond what my initial version could deliver.
Mon Nov 1
I took the weekend to work through how to add a name
column to the scaffolding generated by mix phx.gen.auth
to more closely match Laravel Jetstream's profile update flow. In Jetstream, the photo, name, and email address are part of the same change request.
For gen.auth
, email changes are gated behind a validation request happening first. This makes sense, you want to make sure the email is valid and accessible before you change the user to it. The problem with this is that lumping the name into the same changset creates a slight code smell.
To carve out saving name involves somewhat of a side effect as we modify email_changeset
in lib/opportunity_knocks/accounts/user.ex
to allow the pattern %{changes: %{name: _}} = changeset -> changeset |> delete_change(:email)
in the case statement. This allows name to go through but we push email out of the change in the event both name and email are updated at the same time. This lets us change apply_user_email
in lib/opportunity_knocks/accounts.ex
to use the snippet:
|> case do
%{changes: %{name: _}} = changeset -> changeset |> Repo.update()
%{changes: %{email: _}} = changeset -> changeset |> Ecto.Changeset.apply_action(:update)
This is where the smell is particularly strong as I'm doing the same pattern matching to allow the change in the case of name
and ignore it for email
. I believe as I gain more Elixir knowledge I may refactor this but I also know how deviation can be due to my exposure to Laravel Shift. Shift uses a series of patches to perform framework upgrades. Files that are not altered get changed more easily and when a deviation is detected (merge conflict), Shift goes with its version so you are consistently applying your changes on top of the now improved version. This is the approach we would all take given time or time machines.
Tue Dec 1
Forgot to transfer environment variables for the following:
QUEUE_CONNECTION=redis
REDIS_DB=0
REDIS_CACHE_DB=1
REDIS_HORIZON_DB=2
0
, 1
, and 2
are the defaults this really didn't impact anything.QUEUE_CONNECTION
defaulted to synchronized jobs instead of via redis so Horizon had nothing to do.flushdb
reset horizon and now running jobs via dispatch()
show up as expected.Tue Nov 17
Research into keeping the latest x rows
You cant delete with a skip/take (some mysql limitation)
DELETE FROM `apiLogs`
WHERE id <= (
SELECT id
FROM (
SELECT id
FROM `apiLogs`
ORDER BY id DESC
LIMIT 1 OFFSET 42 -- keep this many records
) foo
)
Ended up going with a more generic version as rather than start from now we can just start at zero and delete up to the count we're looking for.
SELECT
`id`
FROM
`apiLogs`
WHERE
`application_id` NOT in('SAGESYNC', 'SAGESYNCMS', 'SAGESYNCXC4')
ORDER BY
`created_at` ASC
LIMIT 3500
Mon Nov 9
My first attempt was to have an array of application_id but this made the Collection->where() usage more difficult.
{
"blocklist": [
{
"application_id": ["SAGESYNC", "SAGESYNCMS"],
"status": ["422"]
},
{
"application_id": ["SAGESYNCXC4"]
}
]
}
Ultimately we had to settle on this form as the where() clause looks for values.
{
"blocklist": [
{
"application_id": "SAGESYNC",
"status": ["422"]
},
{
"application_id": "SAGESYNCMS",
"status": ["422"]
},
{
"application_id": "SAGESYNCXC4"
}
]
}
/**
* Filter columns against configured blocklist.
*
* @param string|null $application_id
* @param string|null $method
* @param string|null $uri
* @param string|null $status
* @return void
*/
public function isBlocked($application_id, $method, $uri, $status)
{
$result = false;
$blocklist = collect($this->getBlocklist());
$blocked = $blocklist->where('application_id', $application_id);
$blocked->each(function ($block) use (&$result, $method, $uri, $status) {
$methods = Arr::get($block, 'method', null);
$uris = Arr::get($block, 'uri', null);
$statuses = Arr::get($block, 'status', null);
$filtered = $this->hasValue($methods, $method) ?? $this->hasValue($uris, $uri) ?? $this->hasValue($statuses, $status);
if (is_null($filtered) || $filtered === true) {
// $filtered is true when a value is found or null when all fields are null.
$result = true;
}
});
return $result;
}
/**
* An array has value (has normally checks for keys not values).
*
* @param array|null $array
* @param mixed $value
* @return boolean
*/
private function hasValue($array, $value)
{
if (is_null($array)) {
return null;
}
return in_array($value, $array);
}
/**
* Retrieve blocklist section from config file.
*
* @return null|array
*/
private function getBlocklist()
{
$localDisk = \Storage::disk('local');
$contents = $localDisk->get("data/apiLogs/config.json");
$json = json_decode($contents, true);
if (!is_null($json)) {
return $json['blocklist'];
}
return null;
}
Mon Nov 2
See https://laracasts.com/discuss/channels/laravel/the-process-has-been-signaled-with-signal-9
valet restart
Fri Nov 6
The docker image and my local .tool-versions for asdf no longer work.
asdf erlang make[4]: *** [obj/x86_64-apple-darwin19.6.0/opt/smp/inet_drv.o] Error 1
export CFLAGS="-O2 -g -fno-stack-check -Wno-error=implicit-function-declaration"
asdf install erlang 23.0.3
Wed Mar 25
MariaDB generated JSON columns are possible if we unpack the syntax.
// $table->string('payload_type')->storedAs('payload->>"$.type"')->after('name');
$table->string('payload_type')->storedAs("JSON_UNQUOTE(JSON_EXTRACT(`payload`, '$.type'))")->after('name');