Community

Connect with us and enhance your M-Files experience using Unitfly Toolkit for M-Files. Here’s how to get started.

Notifications
Clear all

[Solved] Regex directly from document content

0
Topic starter

I have in the past successfully used Regex in Property Operations to extract data from text in existing metadata properties by specifying the relevant property as parameter in the new Property setter.

Now I need to use Regex to extract data from the full text (or at least the first pages) of document. My configuration does not seem to work without a parameter. Is it not possible to extract data directly from document text? If yes, how to configure it?

7 Answers
0

Yes, you can now use this property and use a REGEX and save the results in a new property.

You can do that in Property Operations by:

  1. Selecting "Advance" configuration mode
  2. Selecting "OnSourceObject" copy mode
  3. Selecting "Function" value type in your property setter
  4. Adding the property which holds the text from a file as a parameter
  5. Setting the regex expression of your choosing
image
0

Hi @karl-lausten, Extension Kit does not have a possibility to manipulate with text from files. So what you are looking for is not possible.

Only similar use case you can do is to read data from excel cell's to properties and then manipulate the text using REGEX.

0
Topic starter

OK, I created a multi-line text property with a script to automatically get the text from the file (or as much of it as fits into the property). Then this property can be used as parameter in EK.

Karl Lausten Topic starter 29/11/2023 4:56 pm

One more thing:

You can create a Regex that will yield multiple values if more than one result is found in the text. As far as I can tell from the Separator configuration it will only deliver one result from each Parameter. So you will only get multiple results if you have several parameters.

I have tested with a Regex like this (\b(?<value>[\w|\d]{17})\b)+ and one Parameter (one multi-line text property). The Parameter in the test case has at least 5 results to this Regex. But only the first one shows up as a result. I can change configuration to pick one of the other results, but not to pick all of them. In this particular case I need all of the results. Is there any way to do that?

Karl Lausten Topic starter 01/12/2023 9:40 am

Any ideas anyone?

0

Apologies you had to wait @karl-lausten.

 

Unfortunately, it is not possible to extract multiple results from each Parameter. You would need one property+rule for each result.

Karl Lausten Topic starter 01/12/2023 12:07 pm

@viktorzagajski Thank you for pointing that out.

This adds a new challenge in a situation where there could be 1 to 5 results (no upper limit). I would have to create n rules to cover n possible output options, but I am not sure that I can add the result from rule 2 the same property as the result from rule 1 without overwriting the first result?
It is not clear whether the Separator works when used on multiple rules or only when there are multiple input parameters to the same rule?

If it is not possible to place the results in the same property, then I would have to create multiple copies of the receiving property and place the result from each rule in different property. This again leads to new challenges when the results need to be used in new functions going forward. I may have to concatenate the results in yet a new property...!

I would encourage Unitfly to consider adding an option to allow multiple results be extracted from one rule.

Karl Lausten Topic starter 01/12/2023 12:26 pm

@viktorzagajski Testing has just confirmed that multiple rules will overwrite the results from previous rules if I use the same property for the results.

0

@karl-lausten could you please try with 1 rule but multiple property setters?

So let's say you have 5 properties for each result and then 1 property which would add them up.

I would propose you try creating 1 rule with 6 property setters:

  • 5 property setters for each result
  • 1 property setter which would append previous 5 properties with using concatenate text function

I hope this helps with your use case.

0
Topic starter

Thank you @viktorzagajski, that will have to do for now.

It does get a bit cumbersome though. In the actual case the customer now says that there may be up to 10 instances (still without an upper limit) in one document, and we need to extract data for 4 different properties. That adds up to 40 - 45 properties rather than 4. Certainly not ideal even when you can hide the additional properties from common users. So my recommendation stands: consider a smarter way to extract multiple values for a specific property.

0

Thank you @karl-lausten, I have created a new user story for our developers, but currently I do not have any information if and when this change will be added to Extension Kit.

I will contact you as soon as I will have more information.

Answer

WATCH THE WEBINAR

Introducing AI Document Kit: Add-on for AI-driven Document Management