CreatedBy and ModifiedBy not imported.

Aug 11, 2011 at 2:52 PM
Edited Aug 11, 2011 at 3:56 PM

I've been testing this tool on my development machine today and I think I have come across a problem. The created by and modified by data in the SharePoint library is always set to the logged on user never the actual user who created or edited the imported document.

I downloaded the source code and have traced the problem to what I believe to be the cause. The NullUserInformationManager object does not return any information about the source file author or creator. This object just returns null. It looks to me like this is a feature that remains to be implemented.

Can someone confirm that this feature works?

 

Nov 17, 2011 at 11:04 PM

I'm having the same issue.  I have just tried using this tool for the first time and it has worked amazingly but my only issue is that it is not importing the created date.  It is importing the modified date for files (not folders)  I really need the CREATED DATE to be imported as well.  Any update on this Discussion?

Mar 7, 2012 at 10:46 AM

Hi,

I downloaded this tool yesterday.  Great stuff!

However, I have just imported a set of files and folders and have the same issue.  The CreatedBy and ModifiedBy is set to me - the current logged on user that did the import.  It does not show the original creator and modifier.

Any fix?

Mark

Mar 12, 2012 at 1:20 PM

Hi Mark,

For office documents (.docx, .xls, ...) the CreatedBy and ModifiedBy should be imported automatically. For other filetypes this was not specifically implemented. If you were to do this yourself you would have to implement and use a new IMetadataProvider. There are currently no plans to implement this.

Mel.

Mar 15, 2012 at 5:10 PM

Mel,

It looks like the only IMetadataProvider in the project is the NullUserInformationManager.  I traced through the code and it looks like Created and ModifiedBy are defaulted to authenticated user no matter what when that provider is used.

What is the status of the project?  I notice on your linkedin profile you are no longer with Orbit One.  I have made some modifications to allow an alternate IMetadataProvider to be specified on the command line and dynamically loaded from dll at runtime.  I would potentially be willing to submit the changes back to this project if it is still fully active.  Otherwise, I was thinking of potentially forking to github.

Thanks for your work on this.  It's saved me tons of time.

Tyler

 

 

Mar 20, 2012 at 12:36 PM

Hi Tyler,

I myself am not actively working on this project, or even with SharePoint, for the moment. I follow the project updates and now and then take time to answer a question, but that's about it. I've forwarded this to the people at Orbit One.

If you want a faster response contacting one of the other coordinators directly might be best.

May 4, 2012 at 8:48 PM

Hi Tyler, is there any way you can send that to me? I am in desperate need of being able to import the created by correctly.  Thank you very much.  torralaq@yahoo.com

May 22, 2012 at 7:03 PM

Tyler, I'd appreciate a copy of your version as well, unless you've already forked it. ndespres@cmitsolutions.com. Thanks!

Jul 3, 2012 at 11:42 AM

Hi Tyler,

Any chance I can grab a copy with your modifications?

Thanks,

Mark 

Feb 7, 2013 at 1:28 PM
Hi Tyler can you send me a copy of your solution to handle metadata? Thanks dguarneri71@gmail.com I have to upload a directory structure with some files and any file has its own metadata.
I want to put the files and metadata in the same folder by creating a metadata file for each file to be loaded with a custom extension, for example ".metadata". I use a custom provider. Do you think it possible?
Feb 7, 2013 at 5:08 PM
Edited Feb 7, 2013 at 5:11 PM
Hi all,

My company restrictions ended up preventing me from uploading me entire modified solution. However, I have recreated some demonstration classes to explain the bulk of the process I used to implement a AD user import as well as a general metadata import.

To be able to import domain users, each user must be an active user already on the sharepoint site. Your customemetadata provider (in the below example, SharenetCustomMetadataProvider), will use an instance of a IUserMapper (in the below example, SharenetUserMapper) to resolve username strings to their SharePoint equivelent user. The SharenetUserMapper below uses another class provided, CorpDomainUserRepository, to connect to the passed in SharePoint site and get a list of all SharePoint users on that site.
public class SharenetCustomMetadataProvider : IMetaDataProvider {

        private IUserMapper myUserMapper;

        public SharenetCustomMetadataProvider() {
            myUserMapper = new SharenetUserMapper(SettingsSingleton.GlobalSettings.SiteUrl);
        }

        public User GetAuthor(string filename) {
            var d = GetAllMetaData(filename);
            if (d.ContainsKey("CreatedBy"))
            {
                if (myUserMapper.Map(d["CreatedBy"].ToUpperInvariant()) != null) {
                    return myUserMapper.Map(d["CreatedBy"].ToUpperInvariant());
                }
            }
            return null;
        }

        public User GetEditor(string filename) {
            var d = GetAllMetaData(filename);
            if (d.ContainsKey("ModifiedBy")) {
                if (myUserMapper.Map(d["ModifiedBy"].ToUpperInvariant()) != null) {
                    return myUserMapper.Map(d["ModifiedBy"].ToUpperInvariant());
                }
            }
            return null;
        }

        private IDictionary<string, string> GetAllMetaData(string filename) {
            var d = ParseMetadata(filename);

            return d;
        }

        public IDictionary<string, string> GetMetaData(string filename) {
            var d = ParseMetadata(filename);
            if (d.ContainsKey("CreatedBy")) d.Remove("CreatedBy");
            if (d.ContainsKey("ModifiedBy")) d.Remove("ModifiedBy");
            return d;
        }

        private Dictionary<string, string> ParseMetadata(string diskPath) {

            var dict = new Dictionary<string, string>();
            var metadataFilename = "";

            if (Directory.Exists(diskPath)) {
                DirectoryInfo di = new DirectoryInfo(diskPath);
                metadataFilename = di.FullName + Path.DirectorySeparatorChar + "folder.meta";
            } else if (File.Exists(diskPath)) {
                FileInfo fi = new FileInfo(diskPath);
                metadataFilename = fi.FullName + ".meta";
            }


            if (!String.IsNullOrEmpty(metadataFilename) && File.Exists(metadataFilename)) {
                var lines = File.ReadAllLines(metadataFilename);
                foreach (string line in lines) {
                    if (line.Contains("=")) {
                        dict[line.Substring(0, line.IndexOf('='))] = line.Substring(line.IndexOf('=') + 1);
                    }
                }
            }


            //dict.ToList().ForEach(x => Console.WriteLine(x.Key + ":" + x.Value));
            return dict;

        }

    }

    public class SharenetUserMapper : IUserMapper
    {
        private string SiteCollectionUrl { get; set; }
        private CorpDomainUserRepository m_repository;
        private IDictionary<string, User> m_mappings;

        public SharenetUserMapper(string siteCollectionUrl) {
            SiteCollectionUrl = siteCollectionUrl;
        }

        public User Map(string username) {

            // lazy init
            if (m_repository == null) {
                m_repository = new CorpDomainUserRepository(SiteCollectionUrl);
                m_mappings = m_repository.GetUsers().ToDictionary(
                    user => user.Name.ToUpperInvariant(),
                    user => user
                    );
            }


            if (!m_mappings.ContainsKey(username.ToUpperInvariant())) {
                Console.WriteLine("can't find user: " + username.ToUpperInvariant());
                return null;
            }
            return m_mappings[username.ToUpperInvariant()];
        }
    }


    public class CorpDomainUserRepository
    {
        private string m_url;

        public CorpDomainUserRepository(string url) {
            m_url = url;
        }

        public IList<User> GetUsers() {
            using (var context = new ClientContext(m_url)) {
                var userList = context.Web.SiteUserInfoList;
                var users = userList.GetItems(new CamlQuery { ViewXml = "<View/>" });
                context.Load(users);
                context.ExecuteQuery();


                foreach (var user in users) {
                    //Console.WriteLine(user["Name"]);
                }

                var ulist = new List<User>();

                foreach (var user in users) {
                    ulist.Add(
                        new User {
                            Id = Convert.ToInt32(user["ID"]),
                            Name = user["Name"].ToString()
                        }

                        );
                }

                return ulist;
                //                return null;

            }
        }
    }
My particular implementation was to do like danieleg mentioned above: For each document imported into sharepoint, I had a corresponding metadata file in the same folder. For example, if I had Spreadsheet.xls, I would have an additional Spreadsheet.xls.metadata file with each line having a key/value pair seperated by an equals sign. One thing to remember is that you need to set these .metadata files as 'hidden', or the importer will import them into sharepoint as well. Here are the example contents of one of those files:
Description0=Emergency Support for Process
URL=https://sharenet.google.com/Open/389034796
BaseName=Emergency Support for Process
ModifiedBy=addomain\sptadmin
CreatedBy=addomain\johndoe
SharenetID=389034814
In the example above: URL and SharenetID are custom defined fields on my doc library. BaseName and Description0 correspond to the filename and description fields. ModifiedBy and CreatedBy correspond to AD users. The OrbitOne code tries to automatically map any fieldnames to a corresponding existing field on the sharepoint object.

If you look in the SharenetCustomMetadataProvider, the 'ParseMetadata()' method examines the passed in sharepoint doc file and loads in the corresponding metadata file should it exist. I also added support for attaching metadata to folders via a 'folder.meta' file that lives inside each folder.

FYI, the code that invokes the metadata file load is inside the FileSystemSource.Load() method.

To directly specify which custom metadata file to use, you can edit the constructor of FileSystemSource(). You can see in there where the FileSystemSource.MetaDataProvider object is being initialized.

If there are any questions, please leave them here. I will monitor the thread and answer the best I can.
Feb 7, 2013 at 5:59 PM
Edited Feb 7, 2013 at 6:11 PM
Here I will describe some things I learned in my migration job using this application. I migrated over 300K files using this tool, so it definitely can hold up and do the job.

One key is to divide and conquer. My entire migration consisted of about 20 'sub-jobs'. It is impossible to get through a massive migration in one shot. You will run into roadblocks.

On very large jobs, it is important that you be able to restart where you left off should the program terminate in the middle of the job. To do this, I added a 'moveFileAsCompleted(file)' method into the SharePointRepository CreateFile() method right after the 'fileCreated = true' line. This meant that if my job error'd, I could identify and fix the error and immediatly start back where I left off. The moveFileAsCompleted simply moved the file that just uploaded into a 'completed' area outside of the path that the orbitone importer was working on. It looked something like:
        private void moveFileAsCompleted(ImportFile file) {
            var newPathFull = file.SourceFile.FullName.Replace("\\Version", "\\VersionMoved");
            var newPath = file.SourceFile.DirectoryName.Replace("\\Version", "\\VersionMoved");
            if (!Directory.Exists(newPath)) {
                Directory.CreateDirectory(newPath);
            }
            System.IO.File.Move(file.OriginalFullName, newPathFull);
            var metaFile = file.SourceFile.FullName + ".meta";
            if (System.IO.File.Exists(metaFile)) {
                System.IO.File.Move(metaFile, newPathFull + ".meta");
            }
        }
There is a filesize cap of 200MB by default in the code. If you need to override this, check out SharePointRepository.CreateValidator(). Here, you can also edit blocked file extensions and other options you may wish to alter.

If you wish to load multiple versions of a file, my approach was to have a filestructure representing each version. For example, if I had 200 files and 3 of those went all the way up to version 5, I would have a folder structure like so:
../Version1/(my base)
../Version2/(my base)
../Version3/(my base)
../Version4/(my base)
../Version5/(my base)
The Version1 folder would have all 200 of my documents. The Version2 folder would have any of those 200 which had at least two versions. The Version5 folder would have only the 3 files which actually had a version 5. With that folder structure in place, I would run a .bat file which would sequentially call the importer similar to the one below:

...\bin\Debug\OrbitOne.SharePoint.Importer.CommandLine.exe -site:"http://mysite.com/programmgmt"   -documentlibrary:"Program Documents"  -username:xxx  -password:xxx -domain:corp -CreateFolders  -folder:"E:\xml_export\programmgmt\Version1\"
...\bin\Debug\OrbitOne.SharePoint.Importer.CommandLine.exe -site:"http://mysite.com/programmgmt"   -documentlibrary:"Program Documents"  -username:xxx  -password:xxx -domain:corp -CreateFolders  -folder:"E:\xml_export\programmgmt\Version2\"
...\bin\Debug\OrbitOne.SharePoint.Importer.CommandLine.exe -site:"http://mysite.com/programmgmt"   -documentlibrary:"Program Documents"  -username:xxx  -password:xxx -domain:corp -CreateFolders  -folder:"E:\xml_export\programmgmt\Version3\"
...\bin\Debug\OrbitOne.SharePoint.Importer.CommandLine.exe -site:"http://mysite.com/programmgmt"   -documentlibrary:"Program Documents"  -username:xxx  -password:xxx -domain:corp -CreateFolders  -folder:"E:\xml_export\programmgmt\Version4\"
...\bin\Debug\OrbitOne.SharePoint.Importer.CommandLine.exe -site:"http://mysite.com/programmgmt"   -documentlibrary:"Program Documents"  -username:xxx  -password:xxx -domain:corp -CreateFolders  -folder:"E:\xml_export\programmgmt\Version5\"
As long as Version tracking is turned on in your doc library, each version should subsequently upload OK. I do recall that I had to alter the DocumentLibraryRepository.CreateFile() function to enable the upload over an existing file. Here is a version of that function which also includes some detailed error handling as well as some retry logic:

void CreateFile(ImportFile importFile, ClientContext context)
        {
            string serverRelativeFileUrl = string.Concat(m_serverRelativeListUrl, importFile.ServerRelativePath);

            using (var stream = importFile.OpenRead())
            {
                if (m_settings.Mode == ImportMode.Execute)
                {
                    log.Info("In CreateFile: " + ApplicationUrl + serverRelativeFileUrl);
                    int tries = 0;
                    bool successful = false;
                    int limit = 10;
                    while(tries < limit && !successful) {
                        try {
                            tries++;
                            // added true to overwrite since checks are done earlier anyways.....TS
                            log.Info("About to call File.SaveBinaryDirect: " + serverRelativeFileUrl);
                            File.SaveBinaryDirect(context, serverRelativeFileUrl, stream, true);
                            successful = true;
                        }
                        catch (Exception ex) {
                            if (ex.Message.ToLower().Contains("internal server error")) {
                                log.Info("Caught retryable File.SaveBinaryDirect exception for: " + serverRelativeFileUrl);
                            } else {
                                throw (ex); // force quit
                            }
                        }
                        if (tries >= limit) {
                            throw new Exception("Retried " + limit.ToString() + " times and still didnt succeed for: " +
                                          serverRelativeFileUrl);
                        }
                    }

                    log.Info("Succeeded ( " + tries + " attempt(s) ): " + serverRelativeFileUrl);
                }
            }
        }
Job validation: Part of my process was validating that each file was uploaded on SharePoint after the job had completed. My method to do this was to leverage my custom metadata. In my metadata, I had a field 'SharenetID' which contained a unique ID for each document. So, when my job was finished, I created a process which would read all my source files, parse their SharenetIDs into a giant list, and verify each of those IDs was found on a document in SharePoint. When this was done, I would be given a list of all docs which did not migrate successfully. I would then handle these on a case by case basis. Sometimes the problem would expose a bug in my code which I would fix. Sometime there would be an invalid filetype that failed upload. Sometime, I would just upload the file via the browser and be done with it. If the documents you are migrating do not have a unique identifier, you could create a script/app which simply appends an additional metadata value to the object with a unique GUID. Then, after importing, you will have a means to verify completion on a file-by-file basis.

The 'workhorse' block of code in my 'verifier' function is below. This is the code which will connect to a sharepoint library and pulldown all values of a given column (in my code below, it is SharenetID):
           //Dictionary<String, String> xmlIds =   .....my externally mapped list of IDs I am checking on SharePoint....

           List<string> spDocs = new List<string>(10000);

            using (var context = CreateContext(pArga)) {
                context.RequestTimeout = 10*60*1000;  // ten minutes
                Web web = context.Web;
                List list = context.Web.Lists.GetByTitle(pArga.SharePointLibrary);
                context.Load(context.Site);
                context.ExecuteQuery();
                context.Load(list);

                CamlQuery cq = new CamlQuery();

                cq.ViewXml =
                    "<View Scope='Recursive'><ViewFields><FieldRef Name='SharenetID'/></ViewFields></View>";

                ListItemCollection collListItem = list.GetItems(cq);

                context.Load(collListItem);

                context.ExecuteQuery();

                foreach (ListItem oListItem in collListItem) {
                    if (oListItem["SharenetID"] != null)
                        spDocs.Add(oListItem["SharenetID"].ToString());
                }


                foreach (var xid in xmlIds.Keys) {
                    if (!spDocs.Contains(xid)) {
                        log("=========================================");
                        log("Didn't find On SP: " + xid);
                        log(xmlIds[xid]);
                        var sizeString = "File not found....";
                        if (File.Exists(xmlIds[xid])) {
                            sizeString = xmlIds[xid].Length.ToString();
                        }
                        log(sizeString);
                    }
                }

            }
Performance: Performance was acceptable by default. I was able to increase perfomance by implementing multi-threading. I found best performance with around 5-8 threads. You could also simulate multithreading by running multiple instances of the exporter on different physical folders.

That's all I could think of at the moment. If there are any questions, please post in here and I will reply the best I can.
Mar 7, 2013 at 1:51 PM
Hi Tyler,

Great to hear that you used the tool with success, and thanks for sharing your improvements and thoughts. I am not working on this project anymore, but i'll contact some people with the suggestion to incorporate them in a new release.