Generate thumbnail image from the first page of a PDF using Batch Processs
Description of how to automatically create a thumbnail image of the first page of a PDF file when registering it as content using batch process.
Before you start
The thumbnail image will be saved in GCS, so you need to integrate your Kuroco account with Firebase in advance. Please refer to Cloud storage integration with Firebase to integrate with Firebase.
Creating content structure
First, create a content structure to register the PDF file. Click [Content structure] from the menu to open the content structure list screen, and then click [Add].
Create it as follows this time.
Field Name | Setting |
---|---|
Name | Auto thumbnail generation |
Also, add the following two fields:
ID | Field Name | Identifier | Field Settings |
---|---|---|---|
1 | Thumbnail image | image | Image (upload to KurocoFiles) |
2 | PDF file | File (upload to KurocoFiles) |
Once you have made the settings, click [Add] to save the content structure.
For more information on creating content structure, please refer to the tutorial, Creating content structure.
API Configuration
Next, configure the API to convert the PDF to an image. Click [API] -> [Default] from the menu to open the endpoint list screen, and then click [Add].
The API add screen will be displayed, so configure it as follows and click [Add].
Field | Setting |
---|---|
Title | pdf-to-thumbs |
Version | 1.0 |
Description | auto Thumbnail generation |
The endpoint list screen for the created API will be displayed.
API Security Configuration
Configure the API security settings. For this API, select "Dynamic Access Token" to prevent requests from external sources.
From the endpoint list screen, click on [Security].
Select [Dynamic Access Token] from the security settings, and then click [Save].
Next, you need to create endpoints.
Creating Endpoints
You will create two endpoints this time
- Get contents that have no thumbnail images registered.
- Update thumbnail image.
In "Get contents that have no thumbnail images registered" endpoint, check whether the content has a registered thumbnail image, and if not, register a thumbnail image for the content with "Update thumbnail image" endpoint.
Creating the "get contents without thumbnail images" endpoint
First, let's create an endpoint to "get the contents with no thumbnail images". Click [Add new endpoint] on the endpoint list screen.
We will create the endpoint as follows:
Field | Value | |
---|---|---|
Path | no-thumb-list | |
Category | Content | |
Model | Topics | |
Operation | list | |
Basic settings | Filter | image="" and pdf!="" (Note: searching by text presence or absence in PDF file names/image descriptions) |
Basic settings | topics_group_id | Enter the ID of "Auto thumbnail generation" created earlier |
Basic settings | cnt | 0 |
After the settings are configured, click the [Add] button at the bottom to save your changes.
Creating the "Update Thumbnail Image" endpoint
Next, we will create the "Update Thumbnail Image" endpoint. Similarly, click "Add new endpoint" on the endpoint list screen, and create as follows:
Item | Content | |
---|---|---|
Path | ||
Category | Contents | |
Model | Topics | |
Operation | update | |
Basic Settings | topics_group_id | Specify the ID of the "Automated Thumbnail Creation" created earlier |
Basic Settings | use_columns | image |
After setting, click [Add] at the bottom of the screen.
After the settings are configured, click the [Add] button at the bottom to save your changes.
That completes the creation of the endpoints.
Creating a temporary storage folder
Next, create a folder to store the images. This folder will be used as a temporary storage location after saving the PDF as a thumbnail image.
Click [File manager] from the menu.
Click on [GCS(Private)] and then click on [New subfolder] to create a new folder.
Enter "pdf_thumb" as the folder name, and click [OK].
"pdf_thumb" folder has been created.
Creating batch process
Next, we will create batch process to convert PDF files into images. We will create the following two batch process:
- Generate a thumbnail image from a PDF
This batch process converts the first page of a PDF into an image. - Register the generated thumbnail images as content
This batch process registers the created thumbnail images as content for the target.
Creating the "Generate Thumbnail Image from PDF" Batch Process
First, let's create the "Generate Thumbnail Image from PDF" batch process. Click [Add] under [Operation] -> [Batch process].
Configure the following on the batch process editor.
Item | Setting |
---|---|
Title | Thumbnail images generation from PDF |
Identifier | create_thumb |
Batch | Hourly |
Next, enter the following in the execution contents.
{*Get a list of contents without thumbnails that have been registered as PDF.*}
{api_internal member_id=1 endpoint='/rcms-api/2/no-thumb-list' query='' method='GET' var='contents_list' status_var='status'}
{if $status == 1 && $contents_list.list|@count > 0}
{foreach from=$contents_list.list key=idx item=item}
{if !$item.image.url && $item.pdf.url}{*Image not set.*}
{get_file url=$item.pdf.url var=temp_path save=1}
{if $temp_path}
{assign var=gcp_pdf_path value='files/g/private/pdf_thumb/'|cat:$item.topics_id|cat:'.pdf'}
{assign var=gcp_img_path value='files/g/private/pdf_thumb/'|cat:$item.topics_id|cat:'.png'}
{*Save the PDF file to a temporary directory on GCS.*}
{put_file tmp_path=$temp_path path=$gcp_pdf_path}
{assign var=data value=null}
{assign_array var=data values=''}
{assign var=data.topics_id value=$item.topics_id}
{assign var=data.pdf value=$item.pdf}
{*Generate thumbnails using the functionality of Cloud Functions.*}
{make_pdf_thumb pdfPath=$gcp_pdf_path destPath=$gcp_img_path callback_batch='update_pdf_thumb_bat' data=$data}
{/if}
{/if}
{/foreach}
{/if}
Please replace the 2
in /rcms-api/2/no-thumb-list
with the ID of the API you just created. You can confirm the ID of the API from the URL of the endpoint list page.
Once you have finished configuring the settings, click [Add] to save the batch process.
Creating the "Registering Generated Thumbnail Images to Content" batch process
Next, create the "Registering Generated Thumbnail Images to Content" batch process.
Similar to previous step, we will add it using the batch process editor with the following settings.
Item | Setting |
---|---|
Title | Register image to content |
Identifier | update_pdf_thumb_bat |
Batch | Batch Template |
Next, enter the following information in the execution contents.
{*Set the data obtained from Cloud Functions.*}
{assign var=topics_id value=$ext_data.data.topics_id}
{assign var=image_name value=$ext_data.data.pdf.desc|replace:'.pdf':''}
{assign var=dest_path value=$ext_data.destPath}
{assign var=file_id value='files/temp/pdf_thumb/'|cat:$topics_id|cat:'.png'}
{assign var=save_path value='/files/temp/pdf_thumb/'|cat:$topics_id|cat:'.png'}
{*Get the image file from GCS to "files/temp".*}
{get_file path=$dest_path save_path=$save_path save=1}
{*Upload the obtained image to the content.*}
{assign_array var=post_data values=''}
{assign_array var=post_data.image values=''}
{assign var=post_data.image.file_id value=$file_id}
{assign var=post_data.image.file_nm value=$image_name|cat:'.png'}
{assign var=post_data.image.desc value=$image_name}
{api_internal endpoint='/rcms-api/2/thumb-update/'|cat:$topics_id member_id=1 method='POST' queries=$post_data var='resp' status_var='status'}
{if $status==1}
{*Delete the PDF and thumbnail upon successful processing.*}
{remove_file path='/'|cat:$dest_path}
{remove_file path='/'|cat:$dest_path|replace:'.png':'.pdf'}
{/if}
Please change 2
in /rcms-api/2/thumb-update
to the ID of the API that you created earlier. You can confirm the API ID from the URL of the endpoint list page.
The batch process configurations are now complete.
Operational verification
Finally, let's perform an operational verification of the settings.
Create content from the "Auto thumbnail generation" content structure by uploading only the PDF file without uploading any images.
After uploading the PDF, click on [Add] to create the content.
Next, let's run the batch process.
Although the settings are configured to run the batch process every hour, for testing purposes, we will run it manually.
Click on [Thumbnail images generation from PDF] that was created earlier from the batch process.
Click on [Run now] next to the title.
When the alert appears, click on [OK]
The batch has been executed.
Next, let's check if the images have been uploaded to the content.
Click on [Content structure] and then click on [List] for the "Automatic thumbnail generation".
Click on the content that was created earlier.
You should be able to confirm that an image has been registered in the "Thumbnail image" field.
It may take several minutes for the image to be created. If the image has not been registered, please wait for some time and check again.
Support
If you have any other questions, please contact us or check out Our Slack Community.