Virtual Dhamma-Vinaya Vihara

Studies, projects & library - [Studium, Projekte & Bibliothek] (brahma & nimmanarati deva) => Translation projects - [Übersetzungsprojekte] => Studygroups & Dhamma Dana - [Studiengruppen & Dhamma Dana] => Zugang zur Einsicht - [Access to Insight] => Topic started by: Johann on June 16, 2018, 07:53:02 PM

Title: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on June 16, 2018, 07:53:02 PM
Atma has just seen the nice Wrap plugin (https://www.dokuwiki.org/plugin:wrap) been installed which might make it much easier to bring html content of ZzE pages into the cms/wiki.

The backward is that it's then no more as "light" as with simple wiki-syntax.

How ever, this topic is dedicated for styling of content, ideas, standards... what ever.

A list of replacements can be found here: http://accesstoinsight.eu/doku.php?id=de:import_zze (development in progress)
Title: Re: [dokuwiki] ATI/ZzE Content-style
Post by: Johann on June 16, 2018, 08:06:26 PM
Existing stylings on ZzE for content part:

Chapter
Code: [Select]
<div class="chapter">
replace with
Code: [Select]
<div chapter>

Editor note

Citation excerpt
Code: [Select]
<div class='excerpt'>
replace with
Code: [Select]
<div excerpt>

Cite (text source)
Code: [Select]
<p class='cite'>Text</p>
replace with
Code: [Select]
<span cite>Text</span>

Free verse
Code: [Select]
<div class="freeverse">

Verse
Code: [Select]
<div class="verse">

Tagline
Code: [Select]
<p class="tagline">
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Johann on June 17, 2018, 06:34:28 PM
Atma had started here with replacements in file-coding.

http://www.accesstoinsight.eu/doku.php?id=de:lib:authors:thanissaro:beyond1

Might be "no" problem if having such as notepad++, multiple files access and regex with replacement function avaliable. By "hand", with given tools and skill, proximate 4000h for all pages.

Atma will now try to get the zze-styling in the ati.eu css.
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Johann on June 25, 2018, 10:24:34 PM
A list of replacements can be found here: http://accesstoinsight.eu/doku.php?id=de:import_zze (development in progress)
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Johann on June 26, 2018, 10:12:36 AM
First action with replace plugin:

Code: [Select]
find	^	replace with	^	matches ^
/<!DOCTYPE(.*?)<body>/s | <body> | 6508 |

Quote
Display: Warning: Unknown: Input variables exceeded 1000. To increase the limit change max_input_vars in php.ini. in Unknown on line 0

Seems that no action was taken and the building of the searchpage result (maybe 10-50Mb large) needed time.

Try to reduce to smaller amount by sellecting name spaces.

Code: "ns de:lib:" [Select]
find	^	replace with	^	matches ^
/<!DOCTYPE(.*?)<body>/s | <body> | 663 |


seems like having executed! Atma will do forther step by step in this way, accourding to the replace list (http://accesstoinsight.eu/doku.php?id=de:import_zze)

Replace plugin (see ini-isdue above) can execute "only" 1000 replace requests at once, as it seems, for now, but it works and is a useful way.
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Johann on June 26, 2018, 12:28:43 PM
header-deletion
find: /<!DOCTYPE(.*?)<body>/s replace with <body>


de: lib: 663, de:tipitaka:sut:an: 483, de:tipitaka:sut:sn: 471, de:tipitaka:sut:kn: j: 614, de: (rest) 797, en:lib: 654, en:zipitaka:sut:an: 364, en:tipitaka:sut:sn: 470, en:tipitaka:sut:kn:j: 611, en:tipitaka:vin: 616, en:tipitaka:sut:kn: (rest) 400, (rest over all) 365, search for *<body> still gives matches


6458 matches (if quick head-callucating was right) of 6508.

forgot the redirect pages (aside of 22, to find with "<!DOCTYPE html PUBLIC"), other redircts seems to be lost, so far.

Bug? After search request of more complex search, the result page's imput fields are destroyed. See attached. " seems to be the issue.
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Johann on June 28, 2018, 01:02:15 PM
Replace-plugin workes fine so far, of cource there is much to spend to get into regex most efficent.

Two things, Nyom Moritz , since the amount of 1000 matches is very less for large clean ups. Could Atma change that? Is it accessable via ftp? And is it of no problem to increase it to maybe 20.000 or even much higher?

Quote
Warning: Unknown: Input variables exceeded 1000. To increase the limit change max_input_vars in php.ini. in Unknown on line 0

Replace seems to have no variables like \L to replace a string in lower cases, or is it just a synatax language-lack of my person. (to get a word, say "halLo_Was_iSt" replaced by "hallo_was_ist".
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Moritz on June 28, 2018, 07:31:34 PM
since the amount of 1000 matches is very less for large clean ups. Could Atma change that? Is it accessable via ftp? And is it of no problem to increase it to maybe 20.000 or even much higher?
This file is not accessible for us, I think.
But I changed the code to work around that.
It should work now for larger numbers, but it could be very slow. There might come a message from the browser "script is not responding" or something, and being asked if wanting to continue the script, simply answer "yes" and wait.

Replace seems to have no variables like \L to replace a string in lower cases, or is it just a synatax language-lack of my person. (to get a word, say "halLo_Was_iSt" replaced by "hallo_was_ist".

I don't know. Maybe there is another syntax for it. I found this: https://stackoverflow.com/questions/34592160/regex-string-substitution-upper-and-lower-case
But don't know at the moment if that would provide a solution.

_/\_
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Johann on June 28, 2018, 09:22:06 PM
Sadhu!
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Johann on June 29, 2018, 12:26:59 PM
The replacment works good for great amount. The only thing that takes time is the builing of the resultpage which is of course huge. Sadhu
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Johann on June 30, 2018, 08:38:23 PM
Firefox verkraften das Laden der Ergebnisseiten ganz gut.

Atma hat heute alle zze txt-file neu hochgeladen, neu-indexiert und von nochmal von Vorne begonnen.

Einen Tag herumgeregext funktioniert alles recht fein. Manchmal, bei größeren Anfragen erscheint nach Abbruch (zwischen drinnen im Überschreiben)

Quote
Fatal error: Maximum execution time of 60 seconds exceeded in /var/www/clients/clientxxx/webxxx/web/inc/io.php on line 235

mag aber mit Verbindung zusammen hängen, da kein besonderes Muster erkannt. -> kommt bei mehr als 4000 betroffenen Seiten (unabhängig der einzelnen Anzahl der Treffer) scheinbar auf.

Stück für Stück, in alle Richtungen, die Codes und Layout... Links ändern, Daten-Table... , wird wohl noch gut eine Woche, zwei, voll in Anspruch nehmen, bis erste annehmliche Erscheinung.

Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Moritz on July 01, 2018, 09:14:48 AM
Sadhu!

Manchmal, bei größeren Anfragen erscheint nach Abbruch (zwischen drinnen im Überschreiben)

Quote
Fatal error: Maximum execution time of 60 seconds exceeded in /var/www/clients/clientxxx/webxxx/web/inc/io.php on line 235

mag aber mit Verbindung zusammen hängen, da kein besonderes Muster erkannt. -> kommt bei mehr als 4000 betroffenen Seiten (unabhängig der einzelnen Anzahl der Treffer) scheinbar auf.

Das hat nichts mit der Verbindung zu tun, sondern damit, dass der Server ein Zeitlimit hat, um ein einzelnes Skript auszuführen. Das Ersetzen in 4000 Seiten dauert scheinbar zu lange und wird dann abgebrochen.

Man kann offenbar aber mitten im Skript immer wieder neu das Zeitlimit sich selbst bestimmen und hoch setzen. Habe nun entsprechend eingebaut, dass es für jeder Datei sich wieder 120 Sekunden reserviert. Das sollte locker reichen und wohl keine solchen Abbrüche mehr stattfinden.

(Habe auch die neuesten Änderungen des Original-Autors nun mit eingebaut, der mein hingehacktes "alles markieren" mit einer sauber zu DokuWiki passenden grafischen Oberfläche versehen hat.)

_/\_
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Moritz on July 01, 2018, 09:32:13 AM
Replace seems to have no variables like \L to replace a string in lower cases, or is it just a synatax language-lack of my person. (to get a word, say "halLo_Was_iSt" replaced by "hallo_was_ist".

I don't know. Maybe there is another syntax for it. I found this: https://stackoverflow.com/questions/34592160/regex-string-substitution-upper-and-lower-case
But don't know at the moment if that would provide a solution.

_/\_

It seems such things like \L and \U for uppercase and lowercase replacement are not "standard" regex features, but only part of extra stuff in some programs.
It would surely be possible to build something like this into the BatchEdit plugin as well. I am not really knowleadgable about regular expressions, but have just found a really good manual (https://www.regular-expressions.info) with clear explanations. So maybe I would try to add things like that when I understand more and find time for it.

And I also still don't know how the indexing works and how fast or slow it is. Have there ever been problems with the BatchEdit plugin showing old results that had not been updated, even when a newer version should exist?
When looking at the code it seems to me that the BatchEdit plugin will always load the latest version and show matches accordingly. So a slow index might perhaps only be a problem sometimes for new pages that have not even been included in the index for the first time so that no results for it would be found...
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Johann on July 01, 2018, 02:47:24 PM
So far, Nyom Moritz , all works fine to progress step by step. No real problems seen in regard of index as it is used only for the selection of avaliable files while the search is done direct in the files (so no need of refreshing / new indexing, except new files are added).

Quote from: https://www.dokuwiki.org/plugin:batchedit#page_lookup
Page lookup

BatchEdit uses DokuWiki page index to get the list of existing pages instead of going through the data directories. If the index is incomplete the plugin will not see some pages. This also applies to the “special” pages, for example, namespace templates.

So it's only about the list of files that batchEdit uses the index.

Index it self: Not sure for now, but it seems so, that refresh index matches also new files. It works not too slow.

In regard of regex, yes, the returns seems to be special. Hier $ seems to work more, as for place a string \1 = $1, maybe it works also for \L = $L (did not try for now).

Since Atma does not intent to learn/invest much in this skills, when ever a need arises, he would look and maybe addopt, investigate samples given here around.

It's good then, steo by Step, to make explainary pages on ATI.eu in all regards, for further future easier work and transfer.

http://accesstoinsight.eu/doku.php?id=de:tech:regex_use z.b.
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Johann on July 01, 2018, 04:39:08 PM
Spoken of problems... Nyom Moritz (attached)

Note that after copy the new file into cs-rm (Cattasanghayana - Roman), Atma did only a refresh index, so maybe this had caused an appearances never had before, also first time to regex in cs-rm .

oh... maybe it (this error) has to do with the uploaded image/media-files (maybe wrong Uppercase-cases), have to look at it and rename them...

Done, so far, but not reidexed for now. It seems that regex also addresses mediafiles, not clear in how far (or just when collecting posdible files avaliable. If also executing them likewise, this could be a mess probably.

now the task of replacement there gave:

Code: [Select]
Fatal error: Uncaught Error: Call to undefined function setTimeLimit() in /var/www/clients/client2157/web5417/web/lib/plugins/batchedit/admin.php:412 Stack trace: #0 /var/www/clients/client2157/web5417/web/lib/plugins/batchedit/admin.php(385): admin_plugin_batchedit->applyMatches() #1 /var/www/clients/client2157/web5417/web/lib/plugins/batchedit/admin.php(102): admin_plugin_batchedit->apply() #2 /var/www/clients/client2157/web5417/web/inc/Action/Admin.php(47): admin_plugin_batchedit->handle() #3 /var/www/clients/client2157/web5417/web/inc/ActionRouter.php(83): dokuwiki\Action\Admin->preProcess() #4 /var/www/clients/client2157/web5417/web/inc/ActionRouter.php(48): dokuwiki\ActionRouter->setupAction('admin') #5 /var/www/clients/client2157/web5417/web/inc/ActionRouter.php(60): dokuwiki\ActionRouter->__construct() #6 /var/www/clients/client2157/web5417/web/inc/actions.php(16): dokuwiki\ActionRouter::getInstance(true) #7 /var/www/clients/client2157/web5417/web/doku.php(120): act_dispatch() #8 {main} thrown in /var/www/clients/client2157/web5417/web/lib/plugins/batchedit/admin.php on line 412

Atma will do an reindex, since files have different names now and index still holds the old. Maybe that solves that.
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Moritz on July 01, 2018, 09:37:11 PM
Spoken of problems... Nyom Moritz (attached)

/.../

Code: [Select]
Fatal error: Uncaught Error: Call to undefined function setTimeLimit() in ...

Atma will do an reindex, since files have different names now and index still holds the old. Maybe that solves that.

Oh, that is my error. I wrote 'setTimeLimit' instead of 'set_time_limit' in the program, without testing it. So should have nothing to do with indexing any new files. I will change that. One moment...

Okay, I changed it. Now it should work correctly, I hope. But have not tested it.

_/\_
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Johann on July 01, 2018, 09:52:01 PM
Index is still slow, might be retested tomorow, Nyom Moritz, so that not break up with the progress.
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Johann on July 02, 2018, 07:07:15 AM
Still this media-error (attachment; messy layout, no select all and resultpage same as searchpage), indexing might not have complete since battery was empty over night.

execution gives now no fatal error

"$" comand seems no more working in replace-line. Maybe a change of plugin-code have been done. Havn't tested "\"

Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Johann on July 02, 2018, 03:44:52 PM
After update Indexing finished, the "layout error" still exists, like pic above. The text says:

Code: [Select]
Warning: file_get_contents(/var/www/clients/client2157/web5417/web/lib/plugins/batchedit/images/file-document.svg): failed to open stream: No such file or directory in /var/www/clients/client2157/web5417/web/lib/plugins/batchedit/admin.php on line 833

Warning: file_get_contents(/var/www/clients/client2157/web5417/web/lib/plugins/batchedit/images/pencil.svg): failed to open stream: No such file or directory in /var/www/clients/client2157/web5417/web/lib/plugins/batchedit/admin.php on line 833

Warning: file_get_contents(/var/www/clients/client2157/web5417/web/lib/plugins/batchedit/images/arrow-down.svg): failed to open stream: No such file or directory in /var/www/clients/client2157/web5417/web/lib/plugins/batchedit/admin.php on

Thinking, oh, my person added the images in the directory, having taken them from the github download (trusting that this might be welcome), and now seems to work fine, in regard of layou.

The download on docuwiki misses those images. My person told it via the forum (https://forum.dokuwiki.org/post/61558).

How ever, the resultpages misses now the amout of pages matched, and sum of matches, which is a useful controll and estimation of success point.
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Moritz on July 02, 2018, 07:41:48 PM
Oh, I forgot to include the images that were added by the original author in his most recent updates. The DokuWiki download still has the version from February. The original author, Mykola Ostrovskyy (https://github.com/dwp-forge/batchedit) has not yet created a new release version (https://github.com/dwp-forge/batchedit/releases) since February. It seems he is still working on some major changes he would like to add before the next "official" version.

So all errors here are just because I forgot to upload certain new files. But now I think it should be okay?

How ever, the resultpages misses now the amout of pages matched, and sum of matches, which is a useful controll and estimation of success point.

I'm not sure how this could be. Testing from here, I get infos like this:

After "Preview":
Quote
Search results: 9808 matches on 1019 pages

After "Apply":
Quote
Edit results: 9808 matches on 1019 pages, 2 replacements applied

"$" comand seems no more working in replace-line. Maybe a change of plugin-code have been done. Havn't tested "\"

The replacement syntax has not been changed. Testing here, both "\" and "$" works for inserting match back-references.

For example:

regex: "/(mindfulness)/"

replacement: "$1 test \1"

will replace "mindfulness" with "mindfulness test mindfulness".
Seems to be working without problem here.

_/\_
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Moritz on July 02, 2018, 07:55:00 PM
Just came across another error that could happen when the amount of matches is really huge (for example, searching for "/is/" - must have millions of matches probably):

Quote
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 69632 bytes) in /var/www/clients/client2157/web5417/web/lib/plugins/batchedit/admin.php on line 336

So just to inform, if coming across that, that is because there are too many results to keep in memory.

I had some discussion with the author who is currently in the process of making some major changes, also thinking about how to deal with huge result sets (https://github.com/dwp-forge/batchedit/issues/16#issuecomment-401596844). So I think I should mention that to him as well and maybe help and try to find a solution. But at the moment don't have much time for this.
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Moritz on July 02, 2018, 08:13:41 PM
"$" comand seems no more working in replace-line. Maybe a change of plugin-code have been done. Havn't tested "\"

The replacement syntax has not been changed. Testing here, both "\" and "$" works for inserting match back-references.

Maybe another hint, not to forget the parantheses ().

\1, \2, \3 or $1, $2, $3 ... etc. are references to the groups inside parantheses. With no parantheses, there is no input for \1, \2, $1, $2 etc.

example:
regex: "/(\s[a-zA-Z]*) something in between (mindfulness)/"

replacement: "\1 something different \2"

would replace like this:

"satipatthana something in between mindfulness"
=> "satipatthana something different mindfulness"

"ariyasacca something in between mindfulness"
=> "ariyasomething something different mindfulness"

_/\_
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Johann on July 02, 2018, 09:18:11 PM
Oh, I forgot to include the images that were added by the original author in his most recent updates. The DokuWiki download still has the version from February. The original author, Mykola Ostrovskyy (https://github.com/dwp-forge/batchedit) has not yet created a new release version (https://github.com/dwp-forge/batchedit/releases) since February. It seems he is still working on some major changes he would like to add before the next "official" version.

So all errors here are just because I forgot to upload certain new files. But now I think it should be okay?

How ever, the resultpages misses now the amout of pages matched, and sum of matches, which is a useful controll and estimation of success point.

I'm not sure how this could be. Testing from here, I get infos like this:

After "Preview":
Quote
Search results: 9808 matches on 1019 pages

After "Apply":
Quote
Edit results: 9808 matches on 1019 pages, 2 replacements applied

Maybe it's a matter of display, caused by responsibility for mobil advices.
But what was just white before, contains now the matches.

/me : There seems to be a lot to understand in regard of "Zwischenspeicher"Also troubles with favicon, even on all places placed and a great deal that in cs-rm, the site takes the old version as the newer, meaning all "drafts" to recover, one by one.

"$" comand seems no more working in replace-line. Maybe a change of plugin-code have been done. Havn't tested "\"

The replacement syntax has not been changed. Testing here, both "\" and "$" works for inserting match back-references.

For example:

regex: "/(mindfulness)/"

replacement: "$1 test \1"

will replace "mindfulness" with "mindfulness test mindfulness".
Seems to be working without problem here.

_/\_

That's great. Might be again certain momentary personal handycap, here and there.
Title: Re: [ATI.eu] ATI/ZzE Content-style
Post by: Johann on July 02, 2018, 09:27:22 PM
"$" comand seems no more working in replace-line. Maybe a change of plugin-code have been done. Havn't tested "\"

The replacement syntax has not been changed. Testing here, both "\" and "$" works for inserting match back-references.

Maybe another hint, not to forget the parantheses ().

\1, \2, \3 or $1, $2, $3 ... etc. are references to the groups inside parantheses. With no parantheses, there is no input for \1, \2, $1, $2 etc.

...
_/\_
Sadhu for acting zuvorkommend.
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on July 26, 2018, 05:08:30 PM
lookahead: no idea why there is recognition but no replacement for example with this regex:

Code: "find" [Select]
/\[\[([^\w]*)\/(?=lib\/|tipitaka\/|cdrom\/|extras\/|news\/|noncanon\/|ousources\/|pdf\/|s\/|tech\/)/


Code: "replace with [Select]
[[de:


a given [[../../../lib gets the match [[../../../ and the replacement look the same [[../../../lib ?

certainly total no more my persons sphere at all, this chess thinking...
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on August 12, 2018, 08:35:25 PM
(...lookahead need something behind. Guess the issue is solved for my person)

The replacment tool has a problem with " . Put into search or replace, it would break the search or replace string after execution. But can be fixed by using [^\w] instead, at least for search.
Title: strange error
Post by: Johann on August 15, 2018, 04:16:51 PM
strange error appeared while doing on replacement after another. But seems to be fine if just loging in again.

screenshort attached.

Info of a "bug": if selecting matches but push again on preview, it will return the replacements in green althought just reviewed.
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Moritz on August 17, 2018, 05:22:57 AM
The first looks like an error on Greensta's side. The database was unavailable for a moment it seems.  :-| But if not happening more often, hopefully not a big problem.

The second bug was introduced by me. I just wanted to have different colors for match and replacement.
So instead of having yellow for both, I wanted to have yellow and green in the preview.
And red and green after the replacement.

But it seems I have not changed it for the first preview, where still both is yellow.

The original author was also wondering why I did this change. Now I see it's different for the first preview. Okay.

There has been a lot of new work been done (https://github.com/dwp-forge/batchedit/commits) in the meantime by the original author (https://github.com/dwp-forge) and others, including some really helpful new features like a progress bar, so that one can estimate how much more time a replacement will take for large updates. And much cleaner solutions to the small changes that I made.

I think I should update to the new version soon.
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on August 17, 2018, 07:15:37 AM
Good to hear. As far for now, Atma is used to and knows it's capacity and ways well.
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Moritz on September 06, 2018, 04:39:30 AM
I installed the new version of BatchEdit plugin.
There are some new useful options next to the search input:



There is also a cog wheel (Zahnrad) symbol in the top right corner next to these options, which brings up some additional options:

Also there is now a time limit on how long a search or replacement can take (can be changed in Admin settings). I have set this to 10 hours now. Should be enough usually.

Very helpful: there is now a progress bar for the search progress and replacement progress, helping estimating how much longer it will take (very light grey, difficult to distinguish from white).

Not tested much, hopefully not any new errors.


Edit: Just tested searching for "dhamma" with no limit of results; returns an empty result page. Probably too many results so that something gets broken.
Searching for "dhamma" with limit of 16000 results works, and takes a few minutes to complete.
Searching for "Johann" without limit works and gives 2076 results.
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on September 06, 2018, 07:43:58 AM
Sadhu!

The options "multiline", upper/lower case... replace the use of delimiter.
Usually putting the search between /{string}/x. x would give definitions to lower/upeer case, more the one line...

So more user-friendly. Let's see if it can match with the previous "hack" in regard of 10.000 and more matches.

Seems to work fine, shortly tested.
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on September 17, 2018, 02:40:43 PM


1. Spaces at line-beginning and linebreaks before tags, using find: \n[\s]+< and replace with \n\n, one one hand because replaced would nevertheless give a match and doing folder by folder would need long and has it's and at the root lang. (just lib:thai: could be managed so far, namespace thanissaro would require 300MB+, all in a lang-space propable some 10GB). A possible way, if nothing else found, is maybe 2 two step way, replacing firts with any special character and this later with two line-break. In this way matches can be reduced, slowly, slowly, step by step (about 20-50h).

2. p-tags with two line-breaks by using something like find <\/p>[\s]*<p> and replace with \n\n.

3. the many spaces and tabs between tags without touching/destroying unformated textpages (not thought in detail about it, but would be a mass-problem as well)

4. later on things like em, i, b, br, u, s-tags, while these matches can of cause be reduced step by step.

5. of cause the will be other mass-replacements harder to manage, but can be all of cause done by beggar-"tricks" and effort and patient like always.

/me : switching back to huge amount of pts-dictionary -> accessibility replacements for "dummies" and those not wishing to become schoolars or x.y.z., ax4 language speaker, Brahmans or depending on them, before or rather then gaining awakening.
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on September 21, 2018, 06:46:31 PM
The result-page does no more display the amount of matched pages and matches. Only the amout of replacements, after execution, would be displayed.
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Danilo on July 19, 2019, 09:20:17 AM
Bhante and readers,  _/\_

I've managed to learn how to use dokuwiki a bit and have improved some details in this (http://www.accesstoinsight.eu/en/theravada) page as a sample.
Between other changes for the better, I've used the dokuwiki's footnotes feature because it doesn't need to go to the bottom of the page to read it.
If there is no problem with the changes, I would like to use this page as a model to edit the other pages as well.

Note that the words (those wrapped in "//") in the titles can't be displayed as italic. But this can be fixed by installing this (https://www.dokuwiki.org/plugin:wikiformatstyling) plugin.

It's pretty simple:
1. Rename the plugin's directory to "wikiformatstyling"
2. Place the directory in "dokuwiki/lib/plugins"

I have tested and it did worked.
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on July 19, 2019, 10:30:42 AM
Sadhu, Sadhu

If having any general styling idea, good to give it as sample. My person is currently prossessing to regex all html stuff global on the pages.

It might need another weeks to match all. i, b, em, strong, u tags may be replaced already completely. Some a-tags are still to match, anchors and picture-links, may still make much work, images might need some manual care since needing the whole path for linking to larger picture.

If Upasaka Danilo focus on one page, such is great.

The header styling seems to be great, just not sure in how far it might cause problems with other plugins like include. Generally my person thinks that the more lesser plugins the lesser troubles and maintaining issues.

Not sure if simply removing stylings in headers, which my person thought of doing after having replaced all htmls, might be not better.

Best to coordinate plugin issues with Upasaka Moritz and also let him keep the overview about installations, at least known if he might not have time.

Some comments on the edits Nyom Danilo made:

Footnote: generally good to use wikis tools, but in regard of many many pages, and the immobility of global replacement without errors, since very different, my person would not make use or it for old pages. The use of wiki only becomes also a problem for extended footnotes, incl. blockquotes, lists... a great challenge even with the ya-list mode, but possible.

Removing div-tags and adding styling tags: Till now Atma looked to simply bring all to one standard. So there might be parts in the header which will then global removed or replaced. The css has one some stylings yet and is not done for now.

Anchors also still in header and under removing: Althought indexes could be removed there are many cross-links on other pages. If removing the original anchors one would need to seek for all links to them and change them as well. Huge work and possible so far, but in regard of old links all around the internet (zze-links will get later redirects to ati), one would cause a lot of "death" links. So anchors should be best never removed.

Divisions around headers:

Code: [Select]
<WRAP centeralign>​ 
===== A Brief Summary of the Buddha'​s Teachings ​=====
</WRAP>

Still there are divisions (use of div instead of WRAP general preferred) wraping headers. This causes the section-edit not to work. Atmas objectives are to fix that global but only after all htmls are replaced.

Further: there is no need for particular styling of headers since that can, actually is already, made in the css-sheed. They have already centeralign styling.

Styling in links: like

Code: [Select]
[[ptf/​dhamma/​sacca/​sacca1/​index_en.html|//​Dukkha://]]

No need to do that manually, there are thousands of such. That will bee replaced by regex global. Half styled link texts will be not possible to maintain easy so removed.

Block-quotes and other div in content:

Code: [Select]
	<blockquote>"​Birth is ended, the holy life fulfilled, the task done! There is nothing further for the sake of this world."​ 	+	<WRAP indent> 
- <cite> [[en:​ptf:​buddha#​done|MN 36]]</cite></​blockquote>

They are all fine already as they are it allready replaces with wiki and wrap tags, like this. It does display wrong as code because there is still a tab at the beginning of the line. Removing just the tab here will display it perfect so far. blockquote is a additional plugin ati uses, incl. cite-tag

html-values:

There are still such as &​iuml;​ around. If coming across, best to select them in a list with its proper replacement so my person could make this global for all pages.

Footer:

No need to but much effort in manual editing the divs and styling. That is an issue for thousands of pages and will be made at "once".

Content edits - Styling edits:

If seeing certain typos, small style issues in text,... aside of div-, span-tags, great if correcting. If seeing something strange of an old html tag, best to report it and collect on one place.

Sadhu for efforts! And mudita.

Atma thinks, how ever, easier to undo the edits and repeat some of the small, incl. As Upasaka thinks that it is well, now possible more informed.


Mudita
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on July 19, 2019, 11:21:51 AM
It's pretty simple:
1. Rename the plugin's directory to "wikiformatstyling"
2. Place the directory in "dokuwiki/lib/plugins"

Installing plugins: it's best, secure and easy made via the Admin-panel. Not sure if Nyom Danilo has admin rights, which should be no problem. Good how ever, since some tools are very powerful and could even destroy much, to coordinate with Nyom Moritz or ask my person if not sure in some regards.

In regard of distinguishing tags and divs, whether already changed or old ones. Old htmls tags incl. always ="..." or and other marks. For the stylings for wike tags always look like this <div class_texts #anchor_text> or  <span class_texts #anchor_text> or <span #anchor_text> or <div class_texts>. If seeing others then this, old, best to list them with a link to the place where seen, in a list (maybe a topic only for that, or here just a post).

Not sure if Nyom Danilo is familiar and skilled with regex, powerful, but also dangerous to destroy a lot. Let it be know if wishing to use it for global changes.
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Moritz on July 19, 2019, 10:00:43 PM
Sadhu! (and good to see Danilo back here)

I have not read everything in detail now, but have, after taking a quick look at it, installed the WikiStyle Script (https://www.dokuwiki.org/plugin:wikiformatstyling) plugin.
It seems the plugin does not change any stored data, but only affects how things are rendered.

Possible that there might be conflicts with other plugins like the include-plugin, as Bhante says. But it would not destroy any data.
If there are any problems with it, one can simply uninstall/deactivate it and maybe think of other solutions.

Not yet looked much at any results and if all works correctly, but the example page (http://www.accesstoinsight.eu/en/theravada) seems to look fine so far.

I gave Danilo Admin rights now.

_/\_ _/\_ _/\_


/me will not have time to look at anything deeper until Sunday
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on July 19, 2019, 10:09:06 PM
Sadhu

The mod seems also to solve the problem of styling in link texts and seems to release from a lot of work so far.
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Danilo on July 21, 2019, 08:14:27 AM
Not sure if Nyom Danilo is familiar and skilled with regex, powerful, but also dangerous to destroy a lot. Let it be know if wishing to use it for global changes.

I have some experience with regex.

If Bhante thinks it's a good idea, he could specify the patters to be matched in the html pages and the output data of the dokuwiki's pages, thus a standard model would be clearly defined to be used as reference and I (or anyone else) could come up with the corresponding regex rules and do the appropriate changes.

When editing the html page, I had saw many tags which didn't appeared to had any effect. So I end up removing it.


Sadhu! (and good to see Danilo back here)

I have not read everything in detail now, but have, after taking a quick look at it, installed the WikiStyle Script (https://www.dokuwiki.org/plugin:wikiformatstyling) plugin.
It seems the plugin does not change any stored data, but only affects how things are rendered.

Possible that there might be conflicts with other plugins like the include-plugin, as Bhante says. But it would not destroy any data.
If there are any problems with it, one can simply uninstall/deactivate it and maybe think of other solutions.

Not yet looked much at any results and if all works correctly, but the example page (http://www.accesstoinsight.eu/en/theravada) seems to look fine so far.

I gave Danilo Admin rights now.

_/\_ _/\_ _/\_


/me will not have time to look at anything deeper until Sunday

Thanks, Moritz. _/\_
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on July 21, 2019, 11:45:25 AM
Althought no problem, Nyom Danilo , general in Dhamma, never, if correcting, risk that something get lost. If not seeing the use for now, simply "hide" it. In that way the Dhamma could be maintained till in our days.

Atma, as told, will need some days more to replace the most (lists and tables will need manual edits).

Next step would be to bring it in a nice easy standard and creat templets for new pages.

Ati had a huge standard and my person thinks that most is good to carry on. It needs a while to understand all (working now 7+ years with it, still finding hidden treasures).

Is Nyom familar with css?

Some of the ATI.synax features he can find at Ati.eu Syntax. Detail Doku is not written for now, even having started. The topic old posts here give some impressions for understanding.

To investigate and see of what Atma is currently doing, best to check Activity Lists or http://accesstoinsight.eu/index?do=recent

If wishing to use regex for many pages (be careful, can damage much and not easy till impossible to recover, incl all Sanghayana Tipitakas) he finds the batcheditor (https://www.dokuwiki.org/plugin:batchedit) tool in the Admin area.

(The last year+ Atma had started all anew surely 2,3 times... 10.000 of pages, because some mistakes...)
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on July 26, 2019, 02:40:22 PM
Updates Info:

Link-tags should be now all replaced by wiki-mark-ups.

Certain things that will need further fixes in regard of links.



My person will focus now on replacing the rest of div and span tags.

The Portuguese pages, btw, since even only 3 pages, would had increased the work and time by 50% are left behind meanwhile, thinking much faster to edit the three pages manual next to the global replacements.

The last and greatest challenge of replacements will then be that of the special lists and tables... and of course many small and special things will be left. 
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on July 31, 2019, 05:28:32 PM
Divisions should be "fine" already. Some less alian Spans are left, and will be made tomorrow, when sun is shining.

There is one issue in regard of classes, but more over on id's: low letters. While notepad+ gives the possibility to replace with the lower-case value, batch-edit seems not to support \l$1. Since there might be lot of anchors and links to them (which are not so problematic since cut down to low case by the system if right, also for anchor extention) containing upper-case, maybe someone has an idea how to transform them more global.
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on August 02, 2019, 02:48:04 PM
Looks like certain character combinations are not matched proper by the header-styling plugin. Sample:

=== //Object:// ===

returns

Object://

Atma didn't test it in detail yet.

http://accesstoinsight.eu/en/lib/authors/thanissaro/bmc/section0020

ohh... doc of wikistyle-pugin tells: ":// won't be converted."
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on August 02, 2019, 05:49:56 PM
Plain text replacement &amp; -> & seems to have "destroyed" certain characters, mostly Pali.

More seemingly a simple header replacement of h5-tag. The action caused invisible characters in &.. html values.

http://accesstoinsight.eu/en/lib/authors/thanissaro/bmc/section0059?do=diff&rev2%5B0%5D=1564461374&rev2%5B1%5D=1564741857&difftype=sidebyside

Could such happen just by connectivity problems?

Not sure if it's better to try to recover the versions with the Spam-tool or replace the strangers (here the invisible could be maybe difficultly handled well).
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Moritz on August 03, 2019, 03:51:39 AM
That looks strange. No clear idea at the moment.
Knowing exactly what the regex to replace was might help to understand it better.

Connection problems should not be a possible reason.
Possibly a programming error from some of the modifications I made.

I think it is best to to keep the old revision, better than trying to replace from this result.

I can look in the morning, or if Bhante could send me the FTP password (already have the password now), maybe I find something useful from here (at work now, in taxi, but a "boring" night so far only sitting and waiting).

_/\_
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on August 03, 2019, 06:15:49 AM
search:

/\s*<h5 id=["']([^"']+?)['"]>​([^<>]+?)<\/h5>\s*/

replace:

\n\n=== //$2// ===\n<span anchor #$1></span>\n\n

Sehr simple.
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on August 03, 2019, 06:44:48 AM
As it infected all non-standard characters: maybe batchedit has any process which deals with char-sets and which was possible interrupted by connectivity. Sometimes, during no reaction, Atma would send orders also twice which might "disturb" ongoing prozesses.

h3 and h4 has been made with similar regex before, but didn't touch that sample page (maybe 20at all infected).

Something un-usual is that this change is printed in gray in the list, possible pointing on something.
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on August 03, 2019, 09:36:32 AM
Interesting is also that the replacments, althought looking similar, are uniqu on each page. So the additional added invisible characters are different while the string appears to be equal.
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Moritz on August 03, 2019, 01:47:35 PM
Still no clear idea.

But it seems I have also not yet seen the full picture of what all went wrong.


So, apparently, something must have gone wrong when saving the page http://accesstoinsight.eu/en/lib/authors/thanissaro/bmc/section0059

between the subsequent revisions (http://accesstoinsight.eu/en/lib/authors/thanissaro/bmc/section0059?do=revisions)
2019/07/30 06:36 -- div headings (http://accesstoinsight.eu/en/lib/authors/thanissaro/bmc/section0059?rev=1564461374)
and
2019/08/01 16:10 -- <p id (http://accesstoinsight.eu/en/lib/authors/thanissaro/bmc/section0059?rev=1564668616)

where the UTF8 encoding of special characters somehow was garbled, which can be seen in the comparison (http://accesstoinsight.eu/en/lib/authors/thanissaro/bmc/section0059?do=diff&rev2%5B0%5D=1564461374&rev2%5B1%5D=1564668616&difftype=sidebyside) between these two directly subsequent revisions.

That is the obvious error that I have seen. Everything else mentioned is not clear to me:

Plain text replacement &amp; -> & seems to have "destroyed" certain characters, mostly Pali.

More seemingly a simple header replacement of h5-tag. The action caused invisible characters in &.. html values.
I have not found where invisible characters appear in relation to h5-tag replacements or elsewhere.

Interesting is also that the replacments, althought looking similar, are uniqu on each page. So the additional added invisible characters are different while the string appears to be equal.

I have not seen any other page than BMC section 59 where the mentioned UTF8 encoding error happened, exactly between these two revisions (http://accesstoinsight.eu/en/lib/authors/thanissaro/bmc/section0059?do=diff&rev2%5B0%5D=1564461374&rev2%5B1%5D=1564668616&difftype=sidebyside).

Can Bhante point to other pages where this happened?


In any case this looks like an encoding error, which happened only one or maybe more times (which I have not seen) when saving some page(s) with BatchEdit.

My first idea would have been that maybe the file(s) was/were edited in an external program in between which could not deal properly with the UTF8 encoding and saved it wrongly.
DokuWiki has some mechanism for recognizing external edits and including them as such in the revision history. But it might be that it does not always work, does not always become "aware" when something was changed from outside (I think the check only happens when saving), so that the change would appear included in another change (in this case a replacement which was mostly correct, but based on a file that had already been modified and wrongly encoded in between).
Not sure about that, if that would be possible that especially BatchEdit might skip the mechanism of "being aware" if something has changed from outside.

Apart from that, I have seen that BatchEdit has become a lot more complex since the last time I looked into the code. Many things happen which I don't understand so quickly. For example it looks like the results of a BatchEdit search (with matches and replacements, even before they are applied) are stored in a certain structure in some temporary files, and most likely they are read back again from those files when finally applying the replacements. Maybe it could happen somehow that the data gets wrongly encoded there sometimes and reloaded from there afterwards for some reason with the wrong encoding. But that is just vague speculation now since I don't really yet see how it all works.


As long as not possible to replicate the error in some clear test case I think it will be difficult to figure it out.  ::)


_/\_ _/\_ _/\_

/me now probably not having much time the next days or week to find the reason.
Title: Re: [ATI.eu] Replacement, regex issues (Content styling)
Post by: Johann on August 03, 2019, 02:05:14 PM
The idea of invisible characters came because a search of the defect strings would only match the page where copied from. As for which pages are effected, just the next, next, ... page of BMC, seemingly only in the BMC2 part.... ohh, ... No. Because the next page links are wrong, always the same page.

So it's just one page. My person then guesses it's because of a connection problem caused certain action not to be fullfilled. Atma also thinks that it would not be that worthy to investigate the plugin fully and redevelop it. Maybe, how ever, good if informing the developer that such happens.

Sadhu for care and much joy with good undertakings, Nyom Moritz .